OFED: The Core of the Digital Research Infrastructure Ecosystem

OFED (Overarching FAIR Ecosystem for Data) serves as the digital backbone of the research infrastructure. It goes far beyond a traditional data lake: it is a comprehensive digital ecosystem offering a wide range of data-related services, including storage, data management planning, access control, data sharing, and analysis tools.

Data ingestion is managed through FAIR-by-design pipelines originating from individual laboratories. Once ingested, the data is consolidated and made accessible within a central data lake, enhanced with a suite of value-added services. Its modular and scalable architecture enables cost optimization, horizontal scaling, and fast data access for both internal and external users.

The system is built on a layered architecture that separates general infrastructure components from domain-specific services.

Access to OFED is handled via the Authentik Identity Provider, federated with NFFA-DI Single Entry Point (SEP), which regulates user access to the NFFA-DI infrastructure, and allows proposals submissions.

A key design feature of OFED is its ability to ingest data in a FAIR-by-design manner, directly from research instruments and laboratory workflows. NFFA-DI research data policy defines accepted file formats, metadata schemas, and access rules, ensuring that data is not only securely stored but also interoperable, reusable, and reproducible throughout the scientific lifecycle.

Specific Components

On top of the general infrastructure, OFED integrates a set of specialized, domain-oriented services:

- MinIO: a high-performance, open-source object storage system used as a lightweight S3-compatible layer for efficient data access and integration with scientific tools. It complements Ceph by offering fast I/O for FAIR-by-design pipelines.
- NOMAD Oasis: developed by the FAIRmat initiative, it is the core platform for visualization, metadata search, and sharing of scientific datasets. It supports structured materials data and provides RESTful APIs and UI components for data exploration.
- JupyterHub: offers a flexible, interactive environment for advanced data analysis, allowing researchers to run notebooks close to the data and integrate analytical routines in Python, R, and other languages within a reproducible framework.
- Data Management Plan (DMP) modules: automate the generation of DMPs in line with Horizon Europe and EOSC recommendations.

Common Software Stack

At the storage level, OFED uses Ceph, an open-source, highly scalable distributed storage platform that supports object, block, and file storage in a unified system. Ceph ensures redundancy, fault tolerance, and high performance, and supports S3-compatible interfaces for integration with cloud-native applications.

For service orchestration and deployment, OFED relies on Kubernetes, the industry-standard open-source container orchestration platform. Kubernetes enables dynamic workload management, service scaling, and high-availability deployments across multiple nodes.