scFM-eval

scFM-eval is a unified, reproducible computational framework for deploying, running, and evaluating single-cell foundation models (scFMs).
It is built on Nextflow DSL2 and provides standardized execution, containerized environments, and automated embedding inference across multiple scFM methods.

[2026.03.04] We released the fine-tuning implementation, primarily designed for data with discrete labels.

[2026.01.13] We released the few-shot learning implementation, primarily designed for data with discrete labels, and fixed several minor bugs in scPRINT deployment.

System Requirements

OS: Linux (linux/amd64)
GPU: NVIDIA GPU required
- NVIDIA driver ≥ 525
Container runtime:
- Docker or
- Apptainer (formerly Singularity)
Nextflow:
- Tested with Nextflow ≥ 25.10.0
- Any version supporting DSL2 should work

Installation

1. Install Nextflow

Please follow the official instructions:
👉 https://github.com/nextflow-io/nextflow

After installation, verify:

nextflow -v

2. Download scFM-eval

git clone https://github.com/Svvord/scFM-eval.git

First-Time Setup (Required Once)

Step 1. Choose Your Container Backend

Open nextflow.config and select one container runtime:

Apptainer (Default; no changes needed unless you modified it before)
Singularity

singularity {
    enabled = true
    ...
}
docker {
    enabled = false
    ...
}
apptainer {
    enabled = false
    ...
}

Docker

docker {
    enabled = true
    ...
}
apptainer {
    enabled = false
    ...
}
singularity {
    enabled = false
    ...
}

⚠️ This only needs to be done once. Subsequent runs require no further configuration.

Step 2. Download Model Checkpoints

Pretrained model weights must be downloaded once before first use.

We provide a helper script download_model_weights.nf to fetch official checkpoints and place them in the correct directory structure.

Example: Download weights for scGPT

nextflow download_model_weights.nf --method scgpt

📌 Important notes:

You only need to download model weights once
Downloaded weights are cached locally and reused automatically
You may also manually place weights if you follow the same directory structure

Directory Structure Example (scGPT)

data/
└── model_weights/
    └── scGPT/
        └── scGPT_human/

The default scGPT version is scGPT_human
To specify this version explicitly in later runs:

--model "scGPT/scGPT_human"

If no version is specified, the framework will use the default pretrained model

Step 3. Input Data Preparation

scFM-eval accepts AnnData (.h5ad) files as the standard input format.

Required Data Format

Expression matrix:
- Raw count matrix (not log-normalized)
- Stored in adata.X
- Must contain the full transcriptome
  - Do not subset to highly variable genes (HVGs)
Gene metadata (adata.var):
- var.index should primarily use HGNC gene symbols
  - This is required for the majority of genes
- Genes without an official HGNC symbol:
  - May use their Ensembl gene ID as a fallback identifier
  - This ensures all genes remain represented with a valid token
  - Most scFM methods rely on token-based gene matching and can accommodate this behavior
- Required columns:
  - gene_symbol: gene identifier used by the model
    - HGNC gene symbol when available
    - Ensembl gene ID used as a fallback when no HGNC symbol exists
  - ensembl_id: corresponding Ensembl gene ID
Cell metadata (adata.obs):
- Must contain a column named:
  - barcode: unique cell barcode identifier
- In most cases, barcode can be a copy of adata.obs_names
- All cell identifiers must be unique
  - If needed, ensure uniqueness by calling:
```
adata.obs_names_make_unique()
```
  - Then populate:
```
adata.obs["barcode"] = adata.obs_names
```

Data Preprocessing Policy

scFM-eval performs minimal preprocessing by design.

Users are expected to perform their own data quality control (QC) prior to input,
such as:
- Filtering low-quality cells
- Doublet removal (optional)
Do NOT perform HVG selection
- All scFM methods in this framework expect the full gene expression profile
- Subsetting to HVGs may lead to:
  - Incompatible model inputs
  - Silent gene dropping
  - Degraded or misleading embeddings
Input data must preserve raw counts across the full transcriptome

Please refer to the provided example dataset:

data/demo/colon_1000.h5ad

First Run Notes (Important)

On the first execution of a method, Nextflow will automatically:
- Pull the corresponding container image
- Cache the image and model weights locally
This initial run may take longer
No additional setup is needed once caching is complete

Embedding Inference (Zero-shot)

Embedding inference can be performed with a single command.

We provide a small demo dataset:

data/demo/colon_1000.h5ad

Example Command

nextflow embed_by_scfm.nf \
  --method scgpt \
  --data data/demo/colon_1000.h5ad

Required arguments:

--method: scFM method name (e.g. scgpt)
--data: input dataset in .h5ad format

Output

Results are written to:

results/embedding/<method_name>/

Embeddings are stored as .h5ad files
The embedding matrix can be accessed via:

adata = sc.read_h5ad("results/embedding/scgpt/colon_1000.h5ad")
embeddings = adata.X

Few-shot Learning

Few-shot learning and label inference can be performed with a single command.

We provide a small demo dataset consisting of a support set and a query set:

data/demo/liver_1shot_support.h5ad
data/demo/liver_1shot_query.h5ad

Step 1: Fit Prototypes (Support Set)

To fit class prototypes, set the mode to fit and provide the support dataset.

nextflow fewshot_by_scfm.nf \
  --method scgpt \
  --mode fit \
  --support data/demo/liver_1shot_support.h5ad

This will generate a prototype file (.npz) saved to:

results/fewshot/fitted_prototypes/<method_name>/

The generated .npz file contains the fitted class prototypes derived from the support set.

Step 2: Infer Labels (Query Set)

To infer labels for a query dataset using the fitted prototypes, set the mode to infer and provide:

the query dataset
the path to the fitted prototype file

nextflow fewshot_by_scfm.nf \
  --method scgpt \
  --mode infer \
  --query data/demo/liver_1shot_query.h5ad \
  --fitted results/fewshot/fitted_prototypes/scgpt/liver_1shot_support.npz

Inference results are written to:

results/fewshot/inference/<method_name>/

One-step Fit + Inference

You can also provide both the support and query datasets in a single command, which will automatically perform prototype fitting followed by inference:

nextflow fewshot_by_scfm.nf \
  --method scgpt \
  --support data/demo/liver_1shot_support.h5ad \
  --query data/demo/liver_1shot_query.h5ad

Notes

Few-shot learning is designed for datasets with discrete label types
The support dataset must contain ground-truth labels
The query dataset does not require labels and will be annotated during inference
By default, labels are read from adata.obs['cell_type']; this can be overridden using the --label_key option

Fine-tuning

Fine-tuning and label prediction can also be performed with a single command.

In the example below, we reuse colon_1000.h5ad as the training dataset. It contains cell-type labels in adata.obs['cell_type']. We also provide colon_50.h5ad as a small test dataset.

data/demo/colon_1000.h5ad
data/demo/colon_50.h5ad

Step 1: Fine-tune Model

To fine-tune a model, set the mode to fit and provide the training dataset.

nextflow finetune_by_scfm.nf \
  --method scgpt \
  --mode fit \
  --train data/demo/colon_1000.h5ad

The fine-tuned model weights will be provided via a symlink and are saved by default to:

results/finetune/finetuned_models/<method_name>/<train_data_id>/

You can then use this fine-tuned model for label prediction.

Step 2: Predict labels

To predict labels, set the mode to pred and provide:

the directory containing the fine-tuned weights
the test dataset

nextflow finetune_by_scfm.nf \
  --method scgpt \
  --mode pred \
  --fitted results/finetune/finetuned_models/scGPT/colon_1000 \
  --test data/demo/colon_50.h5ad

Prediction results are written to:

results/finetune/prediction/<method_name>/

One-step Fine-tune + Predict

You can also provide both the training and test datasets in a single command, which will automatically perform fine-tuning followed by prediction:

nextflow finetune_by_scfm.nf \
  --method scgpt \
  --train data/demo/colon_1000.h5ad \
  --test data/demo/colon_50.h5ad

Notes

Some methods only support zero-shot embeddings. For these methods, we attach a task-agnostic post-hoc classifier, and the fine-tuning process actually optimizes this appended model. If the fine-tuned weight directory contains only a posthoc_classifier/ folder, then --fitted should point to: results/finetune/finetuned_models/<method_name>/<train_data_id>/posthoc_classifier/. CELLama is a special case: it supports fine-tuning the backbone model but does not support native prediction/training in our pipeline. Therefore, we fine-tune both the backbone and the post-hoc classifier. In this case, you should still pass the same --fitted directory as in the scGPT example above, even though it may also contain a posthoc_classifier/ folder.
By default, labels are read from adata.obs["cell_type"]. Any discrete label field can be used in this workflow. To specify the label column, use --finetune_label_key.
You can adjust the number of fine-tuning epochs and the training batch size depending on your GPU resources using: --finetune_epoch and --finetune_batch_size. We provide method-specific default finetune_epoch values (based on the original authors' fine-tuning recipes), so we generally do not recommend changing them unless you have a clear purpose.

Supported Methods & Environments

Method	Container	Model Version	Notes
Cell2Sentence (C2S)	housy17/c2s:latest	v1.2.0	New method (zero/few-shot done; fine-tune pending)
CELLama	housy17/cellama:latest	v0.1.0
CellFM	housy17/cellfm:latest	5054a2a
CellPLM	housy17/cellplm:latest	v0.1.0
Geneformer	housy17/geneformer:latest	v0.1.0
GenePT	housy17/genept:latest	3602699
LangCell	housy17/langcell:latest	69e41ef
scBERT	housy17/scbert:latest	v1.0.0
scCello	housy17/sccello:latest	767585b
scFoundation	housy17/scfoundation:latest	397631c
scGPT	housy17/scgpt:latest	v0.2.4
SCimilarity	housy17/scsimilarity:latest	v0.4.1
scPRINT	housy17/scprint:latest	v2.3.5
UCE	housy17/uce:latest	8227a65

📌 This table will be expanded as more models and configurations are added.

Tutorials & Documentation

A detailed tutorial covering:

Advanced parameters
Batch size and resource control
Few-shot workflows
Fine-tuning workflows
Benchmark evaluation

👉 Tutorial link: (coming soon)

Citation

If this framework or any of the tools provided here are useful for your research, please cite our work — it helps us a lot.

Siyu Hou, Penghui Yang, Wenjing Ma, Jade Xiaoqing Wang and Xiang Zhou (2026). A unified framework enables accessible deployment and comprehensive benchmarking of single-cell foundation models.

@article{hou2026unified,
  title = {A unified framework enables accessible deployment and comprehensive benchmarking of single-cell foundation models},
  author = {Hou, Siyu and Yang, Penghui and Ma, Wenjing and Wang, Jade Xiaoqing and Zhou, Xiang},
  year = {2026},
  publisher = {Cold Spring Harbor Laboratory},
  journal = {bioRxiv}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
bin		bin
cache		cache
data		data
workflows		workflows
.gitignore		.gitignore
README.md		README.md
download_model_weights.nf		download_model_weights.nf
embed_by_scfm.nf		embed_by_scfm.nf
fewshot_by_scfm.nf		fewshot_by_scfm.nf
finetune_by_scfm.nf		finetune_by_scfm.nf
nextflow.config		nextflow.config

Folders and files

Latest commit

History

Repository files navigation

scFM-eval

System Requirements

Installation

1. Install Nextflow

2. Download scFM-eval

First-Time Setup (Required Once)

Step 1. Choose Your Container Backend

Step 2. Download Model Checkpoints

Example: Download weights for scGPT

Directory Structure Example (scGPT)

Step 3. Input Data Preparation

Required Data Format

Data Preprocessing Policy

First Run Notes (Important)

Embedding Inference (Zero-shot)

Example Command

Output

Few-shot Learning

Step 1: Fit Prototypes (Support Set)

Step 2: Infer Labels (Query Set)

One-step Fit + Inference

Notes

Fine-tuning

Step 1: Fine-tune Model

Step 2: Predict labels

One-step Fine-tune + Predict

Notes

Supported Methods & Environments

Tutorials & Documentation

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages