OFED (Overarching FAIR Ecosystem for Data) serves as the digital backbone of the research infrastructure. It goes far beyond a traditional data lake: it is a comprehensive digital ecosystem offering a wide range of data-related services, including storage, data management planning, access control, data sharing, and analysis tools.
Data ingestion is managed through FAIR-by-design pipelines originating from individual laboratories. Once ingested, the data is consolidated and made accessible within a central data lake, enhanced with a suite of value-added services. Its modular and scalable architecture enables cost optimization, horizontal scaling, and fast data access for both internal and external users.
The system is built on a layered architecture that separates general infrastructure components from domain-specific services.
Access to OFED is handled via the Authentik Identity Provider, federated with NFFA-DI Single Entry Point (SEP), which regulates user access to the NFFA-DI infrastructure, and allows proposals submissions.
A key design feature of OFED is its ability to ingest data in a FAIR-by-design manner, directly from research instruments and laboratory workflows. NFFA-DI research data policy defines accepted file formats, metadata schemas, and access rules, ensuring that data is not only securely stored but also interoperable, reusable, and reproducible throughout the scientific lifecycle.
On top of the general infrastructure, OFED integrates a set of specialized, domain-oriented services:
- MinIO: a high-performance, open-source object storage system used as a lightweight S3-compatible layer for efficient data access and integration with scientific tools. It complements Ceph by offering fast I/O for FAIR-by-design pipelines.
- NOMAD Oasis: developed by the FAIRmat initiative, it is the core platform for visualization, metadata search, and sharing of scientific datasets. It supports structured materials data and provides RESTful APIs and UI components for data exploration.
- JupyterHub: offers a flexible, interactive environment for advanced data analysis, allowing researchers to run notebooks close to the data and integrate analytical routines in Python, R, and other languages within a reproducible framework.
- Data Management Plan (DMP) modules: automate the generation of DMPs in line with Horizon Europe and EOSC recommendations.
At the storage level, OFED uses Ceph, an open-source, highly scalable distributed storage platform that supports object, block, and file storage in a unified system. Ceph ensures redundancy, fault tolerance, and high performance, and supports S3-compatible interfaces for integration with cloud-native applications.
For service orchestration and deployment, OFED relies on Kubernetes, the industry-standard open-source container orchestration platform. Kubernetes enables dynamic workload management, service scaling, and high-availability deployments across multiple nodes.
A specialization course — organized by Area Science Park, CNR-IOM and SISSA as part of the activities of the NFFA-DI and PRP@CERIC projects — designed to provide in-depth training in the management, curation, cataloguing and analysis of research data. The course, lasted nine months (from September 2024 to May 2025), and was divided into a teaching phase of 166 hours of theoretical lessons and a seven-month practical phase to be carried out at the laboratories of the institutions from which the participants come. Each of the NFFA-DI operational units has been represented by one or more participants in the Master, who during the months of internship implemented "FAIR-by-design pipelines" — tailored workflows and software solutions — in the various nodes for 15 different types of instruments from the NFFA-DI catalogue.
This marks one of the largest deployments of prototypes of FAIR-by-design pipelines within laboratories belonging to the same research infrastructure in Italy.
The next step will be collaborating to extend these implementations to other NFFA-DI laboratories that use the same techniques explored in the students' thesis. This will help achieve interoperability within the infrastructure and ensure scientific reproducibility, both crucial for high-quality research. From there, the goal will be to further extend common practices to the broader community involved.
Data Curation for Optimizing Molecular Beam Epitaxial Growth of III-V Semiconductor Samples
L. Musini (CNR-IOM@TS)
Design refinement and commissioning of a FAIR-by-design integrated data management system for an STM laboratory
S. Vigneri (CNR-IOM@TS)
Implementation of FAIR by design principles for data acquisition and storage at CNR-IFN Milano
G. Gallerani (CNR-IFN@MI)
FAIR data management for fabrication processes. A NOMAD plugin as implemented at CNR-IFN@TN
M.F. Bontorno (CNR-IFN@TN)
Implementing a FAIR-by-Design Approach for Process Management in Nanoscience Foundries Application in the CNR-ISMN cleanroom
M. Marella (CNR-ISMN@BO)
FAIR data management of quantum-mechanical calculations for the spin states dynamics in SiC materials
G. Castorina (CNR-IMM@CT)
FAIR Data Management of results coming from SEM analysis of Halloysite samples
L. Lidonni (CNR-IMM@CT)
FAIR Data Management of the Results from the MulSKIPS Atomistic Simulation Environment for PVD, CVD, and Laser Annealing
F. Ruberto (CNR-IMM@CT)
Implementation of FAIR data in Transient Absorption Spectroscopy and Photoluminescence Spectroscopy
L. Costantini (CNR-ISM@RM)
FAIR Data Management in Scanning Electron Microscopy
G. Balestra (CNR-NANOTEC@LE)
AMORE: a GUI to promote migration from paper to eLN
E. D'Amico (CNR-SPIN@NA)
Implementation of a FAIR-by-Design System for PLD and RHEED Data Management in the MODA Laboratory Development of a NOMAD plugin and a Python parser for automatic data mapping from eLabFTW to NOMAD
R. Forlenza (CNR-SPIN@NA)
Implementation of FAIR Principles for XPS Data at MODA Laboratory through NeXus File Format
M. Zandavifard (CNR-SPIN@NA)
A FAIR Data Ecosystem in Transmission Electron Microscopy
N. Perin (AREA-RIT)
Implementation of a pipeline for collecting, ingesting and transforming data into standard formats for the LAME FIB-SEM
E. Saadat (AREA-RIT)
Implementation of FAIR-data policies in wafer-scale infrastructures
P. Florio (POLIFAB)
FAIR by Design upgrade for the UNIMI NFFA-DI experimental offer
S. Contu (UNIMI)
Extending NOMAD schemas for the UMIL NFFA-DI Theory & Simulation Installation, with a focus on the Yambo code
E. Molteni (UNIMI)