The project involves not just the acquisition and upgrade of cutting-edge experimental equipment, but also the implementation and use of specific open-source software to manage data according to the FAIR-by-design principles in the laboratories associated with the various Operational Units (OUs) involved.
NFFA-DI research data policy defines accepted file formats, metadata schemas, and access rules, ensuring that data is not only securely stored but also interoperable, reusable, and reproducible throughout the scientific lifecycle.
To ensure compliance with FAIR principles, and by leveraging the expertise and tools available within the international community, NFFA-DI is committed to:
"Digital Structure of NFFA-DI and Overarching FAIR Ecosystem for Data (OFED)" is the title of a NFFA-DI work package fully dedicated to the implementation of a FAIR-by-design data management system. This task aims to consolidate and expand the results of the NFFA-Europe research infrastructure (www.nffa.eu), and to provide users, affiliated researchers, and partners with the tools needed to manage their data using a FAIR-by-design methodology.
All NFFA-DI data pipelines start from a laboratory, where data and metadata are collected from different sources, such as Electronic Lab Notebook (elabFTW as in the figure or similar open source ELN), instrument outputs or log files, and organized within a NeXuS (https://www.nexusformat.org/) formatted file or other open formats allowed by the project's policy.
An extensive upgrade of the laboratories was performed, so to automatize this first part of the data pipeline, to facilitate the researchers work and ensure the best level of data integrity.
The management of all (meta)data coming from the laboratories is outlined in the Research Data Policy of the infrastructure. Due to the specificity of each user proposal and to the fact that in each of them more than one laboratory is involved, DMP must be generated for every accepted proposal. The generation of these specific DMPs is achieved by a semi-automatized procedure elaborated by the OFED system on the basis of the information provided by the Single Entry Point.
In this way, the data management can be personalised for every user, keeping the uniformity of the data management system and of the data structures.
The NFFA-DI approach for the data storage is based on a centralized schema, with a distribution of duties and responsibilities for data maintenance.
Each operational unit can store its own data on a local storage system equipped with a NOMAD Oasis instance, while the central OFED repository is populated by scientifically meaningful datasets selected by the researchers themselves.
The OFED NOMAD installation is constantly updated to maintain the compatibility with the NOMAD Oasis of each operative unit.
OFED (Overarching FAIR Ecosystem for Data) serves as the digital backbone of the research infrastructure. It goes far beyond a traditional data lake: it is a comprehensive digital ecosystem offering a wide range of data-related services, including storage, data management planning, access control, data sharing, and analysis tools.
Data ingestion is managed through FAIR-by-design pipelines originating from individual laboratories. Once ingested, the data is consolidated and made accessible within a central data lake, enhanced with a suite of value-added services. Its modular and scalable architecture enables cost optimization, horizontal scaling, and fast data access for both internal and external users.
The system is built on a layered architecture that separates general infrastructure components from domain-specific services.
Access to OFED is handled via the Authentik Identity Provider, federated with NFFA-DI Single Entry Point (SEP), which regulates user access to the NFFA-DI infrastructure, and allows proposals submissions.
A key design feature of OFED is its ability to ingest data in a FAIR-by-design manner, directly from research instruments and laboratory workflows.
On top of the general infrastructure, OFED integrates a set of specialized, domain-oriented services:
The Common Layer, the foundational core of the OFED architecture, ensures coherence, stability, and scalability throughout the system.
At the storage level, OFED uses Ceph, an open-source, highly scalable distributed storage platform that supports object, block, and file storage in a unified system. Ceph ensures redundancy, fault tolerance, and high performance, and supports S3-compatible interfaces for integration with cloud-native applications.
For service orchestration and deployment, OFED relies on Kubernetes, the industry-standard open-source container orchestration platform. Kubernetes enables dynamic workload management, service scaling, and high-availability deployments across multiple nodes.
Federica Bazzocchi (AREA Science Park)
CNR-IOM@TS: Giancarlo Panaccione
CNR-IFN@MI: Michele Devetta
CNR-IFN@TN: Alessandro Carpentiero
CNR-ISMN@BO: TBD
CNR-IMM@CT: Joannis Deretzis
CNR-ISM@RM: Stefano Turchini
CNR-NANOTEC@LE: Gian Paolo Marra
CNR-SPIN@NA: Emiliano Di Gennaro
AREA Science Park: Matteo Biagetti
POLIMI: Andrea Cattoni
UNIMI: Andrea Giugni
CNR-IOM@TS: Dario De Angelis
CNR-IFN@MI: Michele Devetta
CNR-IFN@TN: Lorenza Ferrario
CNR-ISMN@BO: Stefano Zampolli
CNR-IMM@CT: Antonino La Magna
CNR-ISM@RM: Stefano Turchini
CNR-NANOTEC@LE: Gianluca Balestra
CNR-SPIN@NA: Francesco M. Taurino
AREA Science Park: Federica Bazzocchi
POLIMI: Matteo Cantoni
UNIMI: Andrea Giugni
NFFA-DI was actively involved in the first edition of the Master in Data Management and Curation, a specialization course — organized by Area Science Park, CNR-IOM and SISSA as part of the activities of the NFFA-DI and PRP@CERIC projects — designed to provide in-depth training in the management, curation, cataloguing and analysis of research data. The course lasted nine months (from September 2024 to May 2025), and was divided into a teaching phase of 166 hours of theoretical lessons and a seven-month practical phase to be carried out at the laboratories of the institutions from which the participants come. Each of the NFFA-DI operational units has been represented by one or more participants in the Master, who during the months of internship implemented "FAIR-by-design pipelines" — tailored workflows and software solutions — in the various nodes for 15 different types of instruments from the NFFA-DI catalogue.
This marks one of the largest deployments of prototypes of FAIR-by-design pipelines within laboratories belonging to the same research infrastructure in Italy.
The developed codes are available on GitHub: Master in Data Management and Curation Repository
Data Curation for Optimizing Molecular Beam Epitaxial Growth of III-V Semiconductor Samples
L. Musini (CNR-IOM@TS)
Design refinement and commissioning of a FAIR-by-design integrated data management system for an STM laboratory
S. Vigneri (CNR-IOM@TS)
Implementation of FAIR by design principles for data acquisition and storage at CNR-IFN Milano
G. Gallerani (CNR-IFN@MI)
FAIR data management for fabrication processes. A NOMAD plugin as implemented at CNR-IFN@TN
M.F. Bontorno (CNR-IFN@TN)
Implementing a FAIR-by-Design Approach for Process Management in Nanoscience Foundries Application in the CNR-ISMN cleanroom
M. Marella (CNR-ISMN@BO)
FAIR data management of quantum-mechanical calculations for the spin states dynamics in SiC materials
G. Castorina (CNR-IMM@CT)
FAIR Data Management of results coming from SEM analysis of Halloysite samples
L. Li Donni (CNR-IMM@CT)
FAIR Data Management of the Results from the MulSKIPS Atomistic Simulation Environment for PVD, CVD, and Laser Annealing
F. Ruberto (CNR-IMM@CT)
Implementation of FAIR data in Transient Absorption Spectroscopy and Photoluminescence Spectroscopy
L. Costantini (CNR-ISM@RM)
FAIR Data Management in Scanning Electron Microscopy
G. Balestra (CNR-NANOTEC@LE)
AMORE: a GUI to promote migration from paper to eLN
E. D'Amico (CNR-SPIN@NA)
Implementation of a FAIR-by-Design System for PLD and RHEED Data Management in the MODA Laboratory Development of a NOMAD plugin and a Python parser for automatic data mapping from eLabFTW to NOMAD
R. Forlenza (CNR-SPIN@NA)
Implementation of FAIR Principles for XPS Data at MODA Laboratory through NeXus File Format
M. Zandavifard (CNR-SPIN@NA)
A FAIR Data Ecosystem in Transmission Electron Microscopy
N. Perin (AREA-RIT)
Implementation of a pipeline for collecting, ingesting and transforming data into standard formats for the LAME FIB-SEM
E. Saadat (AREA-RIT)
Implementation of FAIR-data policies in wafer-scale infrastructures
P. Florio (POLIFAB)
FAIR by Design upgrade for the UNIMI NFFA-DI experimental offer
S. Contu (UNIMI)
Extending NOMAD schemas for the UMIL NFFA-DI Theory & Simulation Installation, with a focus on the Yambo code
E. Molteni (UNIMI)
To actuate NFFA-DI vision, all 124 instruments of the catalogue need their FAIR-by-design pipelines towards OFED.
The next step will be collaborating to extend the implementations of single laboratories to other NFFA-DI ones that use the same techniques. This will help achieve interoperability within the infrastructure and ensure scientific reproducibility, both crucial for high-quality research. From there, the goal will be to further extend common practices to the broader community involved.
CNR-IFN@MI
CNR-IFN@TN
CNR-ISMN@BO
CNR-IMM@CT
CNR-ISM@RM
CNR-NANOTEC@LE
CNR-SPIN@NA
AREA Science Park
POLIMI
UNIMI