Virtual services

These virtual services have been developed by AREA Science Park within the framework of the NFFA-DI research project. They provide advanced tools for image and spectral analysis based on state-of-the-art machine learning models, enabling researchers to extract meaningful information from experimental data such as Scanning Electron Microscopy (SEM) and Scanning Tunneling Microscopy (STM) images. The services are specifically designed to support and simplify the analysis of data produced by SEM and STM instruments.

Each service is delivered as a Jupyter notebook, combining clear Markdown-based instructions with executable Python code. This approach allows users to easily understand the workflow and adapt or extend the notebooks to meet their specific research needs.
The services are designed to process an entire folder of images, producing structured, machine-readable output results, and a detailed log file to ensure reproducibility and traceability of the analysis.

In the future, these services are planned to be executed within the NOMAD Oasis environment, which is part of the OFED (Overarching FAIR Ecosystem for Data) digital ecosystem.

All trained machine learning models, along with the associated Python package, are openly available under open-source licenses at: https://gitlab.com/area7/nffa-di/virtual-access-services/.

Further development of additional virtual services will be guided by user feedback and emerging research needs. The research community is therefore encouraged to contribute ideas, suggestions, and feature requests by contacting the development team via email.

REQUIREMENTS

Image Feature Extraction Service

Download

The Image Feature Extraction Service allows users to extract deep visual features from images using a Vision Transformer (ViT) model provided by the Hugging Face transformers library. The extracted features can be reused for a wide range of downstream tasks, including image classification, clustering, and similarity analysis.

The service processes a collection of images and generates numerical representations (deep features) that capture high-level visual characteristics learned by the ViT model.

Input:
- A folder containing image files (.jpg, .jpeg, .png, .tiff)
Output:
- A NumPy array of extracted features
- A CSV metadata file containing image paths, MD5 hashes, and feature indices
- Detailed log files

Feature Similarity Service

Download

The Feature Similarity Service enables image similarity retrieval by comparing images based on their previously extracted deep features (see the Image Feature Extraction Service). It allows users to identify images in a dataset that are most similar to a given query image.

The service computes similarity scores using configurable distance metrics applied to feature vectors. Images in the dataset are then ranked according to their similarity to the query image, making the service suitable for tasks such as content-based image retrieval, data exploration, and pattern discovery.

Input:
- A folder containing the pre-extracted NumPy file and CSV metadata file (see the Image Feature Extraction Service)
- A query image used as the reference for similarity comparison
Output:
- A CSV file containing, for each image, the rank of similarity with the query image and the similarity score (cosine similarity)

SEM Categories Classification Service

Download

The SEM Image Classification Service enables the automatic classification of Scanning Electron Microscopy (SEM) images into 10 distinct categories, supporting high-throughput analysis and efficient sorting of large image datasets.

The service is based on a fine-tuned Vision Transformer (ViT‑B/32) model, published on Hugging Face, and optimized for SEM image data. By applying deep learning–based classification, it allows users to rapidly analyze large volumes of images while preserving accuracy and consistency.

It is particularly suited for large-scale SEM image analysis, dataset curation, and automated sample characterization workflows.

Input:
- A folder containing SEM images (.jpg, .jpeg, .png, .tiff)
Output:
- A CSV file containing, for each image, the model's classification and confidence score of the prediction
- Detailed log files

SEM Scale Classification Service

Download

The SEM Scale Classification Service automatically classifies Scanning Electron Microscopy (SEM) images according to their magnification level, supporting automated metadata enrichment and data curation workflows.
This service addresses a common issue in SEM image archives—missing or inconsistent scale information—by inferring the magnification directly from image content using a deep learning approach.

The service is built on a fine-tuned Vision Transformer (ViT‑B/8) model, published on Hugging Face. The model classifies SEM images into three scale categories based on pixel size:

Pico: pixel size smaller than 1 nm
Nano: pixel size between 1 nm and 1,000 nm (1 µm)
Micro: pixel size larger than 1 µm

This automated classification enables consistent scale labeling across large datasets, facilitating downstream analysis and FAIR data management.

The service is particularly useful for SEM image archives, large-scale data ingestion pipelines, and any scenario where reliable scale metadata is required but unavailable or inconsistent.

Input:
- A folder containing SEM images (.jpg, .jpeg, .png, .tiff)
Output
- A CSV file indicating, for each image, the model's scale classification for the image (pico/nano/micro) and the confidence scores for each prediction
- Detailed log files

STM Artifact Classification Service

Download

The STM Artifact Classification Service provides automated quality control for Scanning Tunneling Microscopy (STM) images by identifying the presence of multi-tip artifacts. The service classifies images into two categories: Artifact-Free and Multi-Tip Artifact, helping researchers ensure data reliability and integrity.
Multi-tip artifacts are typically caused by a faulty or contaminated probe and can severely compromise the interpretation of STM measurements. Early and automatic detection is therefore essential, especially in high-throughput acquisition workflows.

The service is powered by a fine-tuned Vision Transformer (ViT‑B/32) model published on Hugging Face. Its key innovation lies in a preprocessing strategy based on the Fast Fourier Transform (FFT), which enhances the model’s ability to detect characteristic artifact patterns.
Each input image is transformed into a three‑channel tensor consisting of: the original grayscale image, the FFT amplitude spectrum, and the FFT phase information.

This representation enables the model to effectively learn and recognize the distinct frequency-domain signatures associated with multi-tip artifacts.

This service is ideal for STM data validation, large-scale experimental campaigns, and automated preprocessing pipelines where rapid and reliable artifact detection is required.

Input:
- A folder containing STM images (.jpg, .jpeg, .png, .tiff)
Output:
- A CSV file indicating, for each image, whether it is Artifact-Free or contains Multi-Tip Artifacts and the confidence score of the prediction
- Detailed log files

SEM OCR Metadata Extraction Service

Download

The SEM OCR Metadata Extraction Service is a comprehensive pipeline designed to extract, validate, and standardize metadata from Scanning Electron Microscopy (SEM) images. By combining embedded image metadata with Optical Character Recognition (OCR)–based analysis, the service ensures that SEM datasets are accurate, consistent, and machine‑readable, supporting FAIR data principles.

This service is particularly suited for SEM image archives, large-scale data ingestion pipelines, and any workflow requiring automated metadata enrichment, quality assurance, and long-term data preservation.

The service processes SEM images through a multi-step workflow that guarantees reliable metadata extraction and validation:

Initial Validation
- Detection of corrupted files
- Duplicate identification using SHA‑256 hashing
Pixel Size Extraction
- Retrieval of instrument-specific metadata embedded in the image file
- OCR-based reading of scale bars directly from the image content
Consistency Check and Standardization
- Cross-validation of pixel size values obtained from different sources
- Unit conversion and standardization to micrometers (µm)
Scale Classification
- Automatic assignment of a granular scale category: Milli, Micro, Nano, Pico

This approach ensures reliable metadata even when embedded information is missing, incomplete, or inconsistent.
The Jupyter notebook relies on the sem-meta Python package, published on PyPI, which provides specialized modules for SEM metadata extraction, Optical Character Recognition (OCR), Unit conversion and validation.

Input:
- A folder containing SEM images (.jpg, .jpeg, .png, .tiff)
Output:
- A CSV file with the extracted metadata of all processed images
- Individual JSON files containing standardized metadata for each image
- Log files reporting corrupted, invalid, or excluded images
Requirements: Tesseract OCR (version 4) must be installed on the system to enable OCR functionality