scFM-eval is a unified, reproducible computational framework for deploying, running, and evaluating single-cell foundation models (scFMs).
It is built on Nextflow DSL2 and provides standardized execution, containerized environments, and automated embedding inference across multiple scFM methods.
[2026.03.04] We released the fine-tuning implementation, primarily designed for data with discrete labels.
[2026.01.13] We released the few-shot learning implementation, primarily designed for data with discrete labels, and fixed several minor bugs in scPRINT deployment.
- OS: Linux (
linux/amd64) - GPU: NVIDIA GPU required
- NVIDIA driver β₯ 525
- Container runtime:
DockerorApptainer(formerly Singularity)
- Nextflow:
- Tested with
Nextflow β₯ 25.10.0 - Any version supporting DSL2 should work
- Tested with
Please follow the official instructions:
π https://github.com/nextflow-io/nextflow
After installation, verify:
nextflow -vgit clone https://github.com/Svvord/scFM-eval.gitOpen nextflow.config and select one container runtime:
-
Apptainer (Default; no changes needed unless you modified it before)
-
Singularity
singularity {
enabled = true
...
}
docker {
enabled = false
...
}
apptainer {
enabled = false
...
}- Docker
docker {
enabled = true
...
}
apptainer {
enabled = false
...
}
singularity {
enabled = false
...
}
β οΈ This only needs to be done once. Subsequent runs require no further configuration.
Pretrained model weights must be downloaded once before first use.
We provide a helper script download_model_weights.nf to fetch official checkpoints and place them in the correct directory structure.
nextflow download_model_weights.nf --method scgptπ Important notes:
- You only need to download model weights once
- Downloaded weights are cached locally and reused automatically
- You may also manually place weights if you follow the same directory structure
data/
βββ model_weights/
βββ scGPT/
βββ scGPT_human/
- The default scGPT version is
scGPT_human - To specify this version explicitly in later runs:
--model "scGPT/scGPT_human"
- If no version is specified, the framework will use the default pretrained model
scFM-eval accepts AnnData (.h5ad) files as the standard input format.
-
Expression matrix:
- Raw count matrix (not log-normalized)
- Stored in
adata.X - Must contain the full transcriptome
- Do not subset to highly variable genes (HVGs)
-
Gene metadata (
adata.var):var.indexshould primarily use HGNC gene symbols- This is required for the majority of genes
- Genes without an official HGNC symbol:
- May use their Ensembl gene ID as a fallback identifier
- This ensures all genes remain represented with a valid token
- Most scFM methods rely on token-based gene matching and can accommodate this behavior
- Required columns:
gene_symbol: gene identifier used by the model- HGNC gene symbol when available
- Ensembl gene ID used as a fallback when no HGNC symbol exists
ensembl_id: corresponding Ensembl gene ID
-
Cell metadata (
adata.obs):- Must contain a column named:
barcode: unique cell barcode identifier
- In most cases,
barcodecan be a copy ofadata.obs_names - All cell identifiers must be unique
- If needed, ensure uniqueness by calling:
adata.obs_names_make_unique()
- Then populate:
adata.obs["barcode"] = adata.obs_names
- If needed, ensure uniqueness by calling:
- Must contain a column named:
scFM-eval performs minimal preprocessing by design.
-
Users are expected to perform their own data quality control (QC) prior to input,
such as:- Filtering low-quality cells
- Doublet removal (optional)
-
Do NOT perform HVG selection
- All scFM methods in this framework expect the full gene expression profile
- Subsetting to HVGs may lead to:
- Incompatible model inputs
- Silent gene dropping
- Degraded or misleading embeddings
-
Input data must preserve raw counts across the full transcriptome
Please refer to the provided example dataset:
data/demo/colon_1000.h5ad
-
On the first execution of a method, Nextflow will automatically:
- Pull the corresponding container image
- Cache the image and model weights locally
-
This initial run may take longer
-
No additional setup is needed once caching is complete
Embedding inference can be performed with a single command.
We provide a small demo dataset:
data/demo/colon_1000.h5ad
nextflow embed_by_scfm.nf \
--method scgpt \
--data data/demo/colon_1000.h5adRequired arguments:
--method: scFM method name (e.g.scgpt)--data: input dataset in.h5adformat
Results are written to:
results/embedding/<method_name>/
- Embeddings are stored as
.h5adfiles - The embedding matrix can be accessed via:
adata = sc.read_h5ad("results/embedding/scgpt/colon_1000.h5ad")
embeddings = adata.XFew-shot learning and label inference can be performed with a single command.
We provide a small demo dataset consisting of a support set and a query set:
data/demo/liver_1shot_support.h5ad
data/demo/liver_1shot_query.h5ad
To fit class prototypes, set the mode to fit and provide the support dataset.
nextflow fewshot_by_scfm.nf \
--method scgpt \
--mode fit \
--support data/demo/liver_1shot_support.h5adThis will generate a prototype file (.npz) saved to:
results/fewshot/fitted_prototypes/<method_name>/
The generated .npz file contains the fitted class prototypes derived from the support set.
To infer labels for a query dataset using the fitted prototypes, set the mode to infer and provide:
- the query dataset
- the path to the fitted prototype file
nextflow fewshot_by_scfm.nf \
--method scgpt \
--mode infer \
--query data/demo/liver_1shot_query.h5ad \
--fitted results/fewshot/fitted_prototypes/scgpt/liver_1shot_support.npzInference results are written to:
results/fewshot/inference/<method_name>/
You can also provide both the support and query datasets in a single command, which will automatically perform prototype fitting followed by inference:
nextflow fewshot_by_scfm.nf \
--method scgpt \
--support data/demo/liver_1shot_support.h5ad \
--query data/demo/liver_1shot_query.h5ad- Few-shot learning is designed for datasets with discrete label types
- The support dataset must contain ground-truth labels
- The query dataset does not require labels and will be annotated during inference
- By default, labels are read from
adata.obs['cell_type']; this can be overridden using the--label_keyoption
Fine-tuning and label prediction can also be performed with a single command.
In the example below, we reuse colon_1000.h5ad as the training dataset. It contains cell-type labels in adata.obs['cell_type']. We also provide colon_50.h5ad as a small test dataset.
data/demo/colon_1000.h5ad
data/demo/colon_50.h5ad
To fine-tune a model, set the mode to fit and provide the training dataset.
nextflow finetune_by_scfm.nf \
--method scgpt \
--mode fit \
--train data/demo/colon_1000.h5adThe fine-tuned model weights will be provided via a symlink and are saved by default to:
results/finetune/finetuned_models/<method_name>/<train_data_id>/
You can then use this fine-tuned model for label prediction.
To predict labels, set the mode to pred and provide:
- the directory containing the fine-tuned weights
- the test dataset
nextflow finetune_by_scfm.nf \
--method scgpt \
--mode pred \
--fitted results/finetune/finetuned_models/scGPT/colon_1000 \
--test data/demo/colon_50.h5ad Prediction results are written to:
results/finetune/prediction/<method_name>/
You can also provide both the training and test datasets in a single command, which will automatically perform fine-tuning followed by prediction:
nextflow finetune_by_scfm.nf \
--method scgpt \
--train data/demo/colon_1000.h5ad \
--test data/demo/colon_50.h5ad-
Some methods only support zero-shot embeddings. For these methods, we attach a task-agnostic post-hoc classifier, and the fine-tuning process actually optimizes this appended model. If the fine-tuned weight directory contains only a
posthoc_classifier/folder, then--fittedshould point to:results/finetune/finetuned_models/<method_name>/<train_data_id>/posthoc_classifier/. CELLama is a special case: it supports fine-tuning the backbone model but does not support native prediction/training in our pipeline. Therefore, we fine-tune both the backbone and the post-hoc classifier. In this case, you should still pass the same--fitteddirectory as in the scGPT example above, even though it may also contain aposthoc_classifier/folder. -
By default, labels are read from
adata.obs["cell_type"]. Any discrete label field can be used in this workflow. To specify the label column, use--finetune_label_key. -
You can adjust the number of fine-tuning epochs and the training batch size depending on your GPU resources using:
--finetune_epochand--finetune_batch_size. We provide method-specific defaultfinetune_epochvalues (based on the original authors' fine-tuning recipes), so we generally do not recommend changing them unless you have a clear purpose.
| Method | Container | Model Version | Notes |
|---|---|---|---|
| Cell2Sentence (C2S) | housy17/c2s:latest | v1.2.0 | New method (zero/few-shot done; fine-tune pending) |
| CELLama | housy17/cellama:latest | v0.1.0 | |
| CellFM | housy17/cellfm:latest | 5054a2a | |
| CellPLM | housy17/cellplm:latest | v0.1.0 | |
| Geneformer | housy17/geneformer:latest | v0.1.0 | |
| GenePT | housy17/genept:latest | 3602699 | |
| LangCell | housy17/langcell:latest | 69e41ef | |
| scBERT | housy17/scbert:latest | v1.0.0 | |
| scCello | housy17/sccello:latest | 767585b | |
| scFoundation | housy17/scfoundation:latest | 397631c | |
| scGPT | housy17/scgpt:latest | v0.2.4 | |
| SCimilarity | housy17/scsimilarity:latest | v0.4.1 | |
| scPRINT | housy17/scprint:latest | v2.3.5 | |
| UCE | housy17/uce:latest | 8227a65 |
π This table will be expanded as more models and configurations are added.
A detailed tutorial covering:
- Advanced parameters
- Batch size and resource control
- Few-shot workflows
- Fine-tuning workflows
- Benchmark evaluation
π Tutorial link: (coming soon)
If this framework or any of the tools provided here are useful for your research, please cite our work β it helps us a lot.
Siyu Hou, Penghui Yang, Wenjing Ma, Jade Xiaoqing Wang and Xiang Zhou (2026). A unified framework enables accessible deployment and comprehensive benchmarking of single-cell foundation models.
@article{hou2026unified,
title = {A unified framework enables accessible deployment and comprehensive benchmarking of single-cell foundation models},
author = {Hou, Siyu and Yang, Penghui and Ma, Wenjing and Wang, Jade Xiaoqing and Zhou, Xiang},
year = {2026},
publisher = {Cold Spring Harbor Laboratory},
journal = {bioRxiv}
}