The project involves not just the acquisition and upgrade of cutting-edge experimental equipment, but also the implementation and use of specific open-source software to manage data according to the FAIR-by-design principles in the laboratories associated with the various Operational Units (OUs) involved.

NFFA-DI research data policy defines accepted file formats, metadata schemas, and access rules, ensuring that data is not only securely stored but also interoperable, reusable, and reproducible throughout the scientific lifecycle.

To ensure compliance with FAIR principles, and by leveraging the expertise and tools available within the international community, NFFA-DI is committed to:

  • Providing detailed descriptions of a set of well-established experimental methods, defining their definitive characterization so that research data systematically includes all necessary data and metadata for detailed comparability of experiments;
  • Developing sector-specific data and metadata schemas, promoting their standardization and integration with electronic laboratory notebooks.

FAIR-by-design strategy

"Digital Structure of NFFA-DI and Overarching FAIR Ecosystem for Data (OFED)" is the title of a NFFA-DI work package fully dedicated to the implementation of a FAIR-by-design data management system. This task aims to consolidate and expand the results of the NFFA-Europe research infrastructure (www.nffa.eu), and to provide users, affiliated researchers, and partners with the tools needed to manage their data using a FAIR-by-design methodology.

All NFFA-DI data pipelines start from a laboratory, where data and metadata are collected from different sources, such as Electronic Lab Notebook (elabFTW as in the figure or similar open source ELN), instrument outputs or log files, and organized within a NeXuS (https://www.nexusformat.org/) formatted file or other open formats allowed by the project's policy.
An extensive upgrade of the laboratories was performed, so to automatize this first part of the data pipeline, to facilitate the researchers work and ensure the best level of data integrity. 

The management of all (meta)data coming from the laboratories is outlined in the Research Data Policy of the infrastructure. Due to the specificity of each user proposal and to the fact that in each of them more than one laboratory is involved, DMP must be generated for every accepted proposal. The generation of these specific DMPs is achieved by a semi-automatized procedure elaborated by the OFED system on the basis of the information provided by the Single Entry Point. 
In this way, the data management can be personalised for every user, keeping the uniformity of the data management system and of the data structures. 

The NFFA-DI approach for the data storage is based on a centralized schema, with a distribution of duties and responsibilities for data maintenance.
Each operational unit can store its own data on a local storage system equipped with a NOMAD Oasis instance, while the central OFED repository is populated by scientifically meaningful datasets selected by the researchers themselves. 
The OFED NOMAD installation is constantly updated to maintain the compatibility with the NOMAD Oasis of each operative unit. 

OFED: The Core of the Digital Research Infrastructure Ecosystem

OFED (Overarching FAIR Ecosystem for Data) serves as the digital backbone of the research infrastructure. It goes far beyond a traditional data lake: it is a comprehensive digital ecosystem offering a wide range of data-related services, including storage, data management planning, access control, data sharing, and analysis tools.

Data ingestion is managed through FAIR-by-design pipelines originating from individual laboratories. Once ingested, the data is consolidated and made accessible within a central data lake, enhanced with a suite of value-added services. Its modular and scalable architecture enables cost optimization, horizontal scaling, and fast data access for both internal and external users.

The system is built on a layered architecture that separates general infrastructure components from domain-specific services.

Access to OFED is handled via the Authentik Identity Provider, federated with NFFA-DI Single Entry Point (SEP), which regulates user access to the NFFA-DI infrastructure, and allows proposals submissions.

A key design feature of OFED is its ability to ingest data in a FAIR-by-design manner, directly from research instruments and laboratory workflows.

Specific Components

On top of the general infrastructure, OFED integrates a set of specialized, domain-oriented services:

  • MinIO: a high-performance, open-source object storage system used as a lightweight S3-compatible layer for efficient data access and integration with scientific tools. It complements Ceph by offering fast I/O for FAIR-by-design pipelines.
  • NOMAD Oasis: developed by the FAIRmat initiative, it is the core platform for visualization, metadata search, and sharing of scientific datasets. It supports structured materials data and provides RESTful APIs and UI components for data exploration.
  • JupyterHub: offers a flexible, interactive environment for advanced data analysis, allowing researchers to run notebooks close to the data and integrate analytical routines in Python, R, and other languages within a reproducible framework.
  • EasyDMP: module that automates the generation of proposals' and laboratories' Data Management Plans in line with NFFA-DI Data Policy and Horizon Europe and EOSC recommendations.

Common Software Stack

The Common Layer, the foundational core of the OFED architecture, ensures coherence, stability, and scalability throughout the system.

At the storage level, OFED uses Ceph, an open-source, highly scalable distributed storage platform that supports object, block, and file storage in a unified system. Ceph ensures redundancy, fault tolerance, and high performance, and supports S3-compatible interfaces for integration with cloud-native applications.

For service orchestration and deployment, OFED relies on Kubernetes, the industry-standard open-source container orchestration platform. Kubernetes enables dynamic workload management, service scaling, and high-availability deployments across multiple nodes.

Data management crew

FAIR-by-design Coordinator (WP3 Leader)

Federica Bazzocchi (AREA Science Park)

 

FAIR-by-design managers

CNR-IOM@TS: Giancarlo Panaccione

CNR-IFN@MI: Michele Devetta

CNR-IFN@TN: Alessandro Carpentiero

CNR-ISMN@BO: TBD

CNR-IMM@CT: Joannis Deretzis

CNR-ISM@RM: Stefano Turchini

CNR-NANOTEC@LE: Gian Paolo Marra

CNR-SPIN@NA: Emiliano Di Gennaro

AREA Science Park: Matteo Biagetti

POLIMI: Andrea Cattoni

UNIMI: Andrea Giugni

 

FAIR-by-design implementation task leaders

CNR-IOM@TS: Dario De Angelis

CNR-IFN@MI: Michele Devetta

CNR-IFN@TN: Lorenza Ferrario

CNR-ISMN@BO: Stefano Zampolli

CNR-IMM@CT: Antonino La Magna

CNR-ISM@RM: Stefano Turchini

CNR-NANOTEC@LE: Gianluca Balestra

CNR-SPIN@NA: Francesco M. Taurino

AREA Science Park: Federica Bazzocchi

POLIMI: Matteo Cantoni

UNIMI: Andrea Giugni

Master in Data Management and Curation

NFFA-DI was actively involved in the first edition of the Master in Data Management and Curation, a specialization course — organized by Area Science Park, CNR-IOM and SISSA as part of the activities of the NFFA-DI and PRP@CERIC projects — designed to provide in-depth training in the management, curation, cataloguing and analysis of research data. The course lasted nine months (from September 2024 to May 2025), and was divided into a teaching phase of 166 hours of theoretical lessons and a seven-month practical phase to be carried out at the laboratories of the institutions from which the participants come. Each of the NFFA-DI operational units has been represented by one or more participants in the Master, who during the months of internship implemented "FAIR-by-design pipelines" — tailored workflows and software solutions — in the various nodes for 15 different types of instruments from the NFFA-DI catalogue. 

This marks one of the largest deployments of prototypes of FAIR-by-design pipelines within laboratories belonging to the same research infrastructure in Italy.

The developed codes are available on GitHub: Master in Data Management and Curation Repository

NFFA-DI Thesis

Data Curation for Optimizing Molecular Beam Epitaxial Growth of III-V Semiconductor Samples
L. Musini (CNR-IOM@TS)

Design refinement and commissioning of a FAIR-by-design integrated data management system for an STM laboratory
S. Vigneri (CNR-IOM@TS)

Implementation of FAIR by design principles for data acquisition and storage at CNR-IFN Milano
G. Gallerani (CNR-IFN@MI)

FAIR data management for fabrication processes. A NOMAD plugin as implemented at CNR-IFN@TN
M.F. Bontorno (CNR-IFN@TN)

Implementing a FAIR-by-Design Approach for Process Management in Nanoscience Foundries Application in the CNR-ISMN cleanroom
M. Marella (CNR-ISMN@BO)

FAIR data management of quantum-mechanical calculations for the spin states dynamics in SiC materials
G. Castorina (CNR-IMM@CT)

FAIR Data Management of results coming from SEM analysis of Halloysite samples
L. Li Donni (CNR-IMM@CT)

FAIR Data Management of the Results from the MulSKIPS Atomistic Simulation Environment for PVD, CVD, and Laser Annealing
F. Ruberto (CNR-IMM@CT)

Implementation of FAIR data in Transient Absorption Spectroscopy and Photoluminescence Spectroscopy
L. Costantini (CNR-ISM@RM)

FAIR Data Management in Scanning Electron Microscopy
G. Balestra (CNR-NANOTEC@LE)

AMORE: a GUI to promote migration from paper to eLN
E. D'Amico (CNR-SPIN@NA)

Implementation of a FAIR-by-Design System for PLD and RHEED Data Management in the MODA Laboratory Development of a NOMAD plugin and a Python parser for automatic data mapping from eLabFTW to NOMAD
R. Forlenza (CNR-SPIN@NA)

Implementation of FAIR Principles for XPS Data at MODA Laboratory through NeXus File Format
M. Zandavifard (CNR-SPIN@NA)

A FAIR Data Ecosystem in Transmission Electron Microscopy
N. Perin (AREA-RIT)

Implementation of a pipeline for collecting, ingesting and transforming data into standard formats for the LAME FIB-SEM
E. Saadat (AREA-RIT)

Implementation of FAIR-data policies in wafer-scale infrastructures
P. Florio (POLIFAB)

FAIR by Design upgrade for the UNIMI NFFA-DI experimental offer
S. Contu (UNIMI)

Extending NOMAD schemas for the UMIL NFFA-DI Theory & Simulation Installation, with a focus on the Yambo code
E. Molteni (UNIMI)

Local FAIR-by-design pipelines

To actuate NFFA-DI vision, all 124 instruments of the catalogue need their FAIR-by-design pipelines towards OFED. 

The next step will be collaborating to extend the implementations of single laboratories to other NFFA-DI ones that use the same techniques. This will help achieve interoperability within the infrastructure and ensure scientific reproducibility, both crucial for high-quality research. From there, the goal will be to further extend common practices to the broader community involved.

Local nodes implementations in detail

CNR-IOM@TS

CNR-IFN@MI

CNR-IFN@TN

CNR-ISMN@BO

CNR-IMM@CT

CNR-ISM@RM

CNR-NANOTEC@LE

CNR-SPIN@NA

AREA Science Park

POLIMI

UNIMI