Skip to content

PeterPonyu/scfocus

Repository files navigation

scFocus

Overview

scFocus is a reinforcement-learning-based method for analyzing lineage branching in low-dimensional single-cell embeddings. It combines branch probabilities with unsupervised structure in the latent space to help characterize continuous cell-state transitions and related visualization workflows.

Graphical Abstract

Pattern Image

Installation

PyPI

Requirements

  • Python >= 3.9
  • Required packages: scanpy>=1.10.4, torch>=1.13.1, joblib>=1.2.0, tqdm>=4.64.1, streamlit>=1.24.0

Install from PyPI

pip install scfocus

Install from source

git clone https://github.com/PeterPonyu/scfocus.git
cd scfocus
pip install -e .

Quick Start

Basic example to get started with scFocus:

import scanpy as sc
import scfocus

# Load your single-cell data
adata = sc.read_h5ad('your_data.h5ad')

# Preprocess: normalize, log-transform, and compute PCA
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, n_top_genes=2000)
adata = adata[:, adata.var.highly_variable]
sc.pp.pca(adata)

# Compute UMAP embedding
sc.pp.neighbors(adata, n_neighbors=15)
sc.tl.umap(adata)

# Run scFocus analysis
embedding = adata.obsm['X_umap']
focus = scfocus.focus(embedding, n=6, pct_samples=0.01)
focus.meta_focusing(n=3)
focus.merge_fp2()

# Add focus probabilities to your AnnData object
adata.obsm['focus_probs'] = focus.mfp[0]
for i in range(focus.mfp[0].shape[1]):
    adata.obs[f'Fate_{i}'] = focus.mfp[0][:, i]

# Visualize results
sc.pl.umap(adata, color=[f'Fate_{i}' for i in range(focus.mfp[0].shape[1])])

Key Parameters

The focus class accepts the following key parameters:

  • f (array-like): Latent space of the original data (e.g., UMAP or t-SNE coordinates)
  • n (int, default=8): Number of parallel agents/branches to identify
  • pct_samples (float, default=0.125): Percentage of samples used in each training step
  • max_steps (int, default=5): Maximum steps per training episode
  • num_episodes (int, default=1000): Number of training episodes
  • hidden_dim (int, default=128): Hidden layer dimension for neural networks
  • res (float, default=0.05): Resolution for merging similar focus patterns

For a complete list of parameters and their descriptions, see the API documentation.

Documentation

Documentation Status

Tutorials and API documentation are available at https://scfocus.readthedocs.io/en/latest/, including:

  • Notebooks for different datasets
  • Step-by-step tutorials
  • API reference

Web Interface

scFocus provides an interactive web interface for data analysis.

Online Access

Access the hosted version at scfocus.streamlit.app.

Local Interface

Launch the local web interface:

scfocus ui

Using the Web Interface

  1. Upload Data: Support for .h5ad files or 10x Genomics format (matrix.mtx, features.tsv, barcodes.tsv)
  2. Configure Parameters:
    • Number of highly variable genes (200-5000, default: 2000)
    • Number of neighbors for UMAP (2-50, default: 15)
    • Minimum distance for UMAP (0.0-2.0, default: 0.5)
    • Number of branches (2-10, default: 6)
  3. Process: Click "Process" to run the analysis pipeline
  4. Visualize: View UMAP plots colored by cell fate probabilities
  5. Download: Export processed data as .h5ad file

Example datasets are available in the data/ folder of the repository.

Command Line Interface

Available Commands

# Launch web interface
scfocus ui

# Additional CLI commands may be added in future releases

Workflow Overview

The typical scFocus workflow consists of:

  1. Preprocessing: Normalize and log-transform the data, select highly variable genes
  2. Dimensionality Reduction: Compute PCA and UMAP/t-SNE embeddings
  3. scFocus Analysis: Apply reinforcement learning to identify lineage branches
  4. Merge Patterns: Consolidate similar focus patterns
  5. Visualization: Display cell fate probabilities and branch assignments

Troubleshooting

Common Issues

Issue: ModuleNotFoundError: No module named 'torch'

  • Solution: Install PyTorch: pip install torch>=1.13.1

Issue: CUDA out of memory error

  • Solution: The algorithm automatically uses CPU if GPU is unavailable. For large datasets, consider reducing n (number of agents) or pct_samples.

Issue: Streamlit command not found

  • Solution: Install streamlit: pip install streamlit>=1.24.0

Issue: Analysis is slow

  • Solution:
    • Reduce num_episodes (default 1000) for faster results
    • Decrease n (number of agents) to reduce computational load
    • Use GPU if available

Getting Help

  • Check the documentation
  • Open an issue on GitHub
  • Review example notebooks in the documentation

Development

Setting Up Development Environment

# Clone the repository
git clone https://github.com/PeterPonyu/scfocus.git
cd scfocus

# Install in development mode
pip install -e .

# Install additional development dependencies
pip install -r requirements.txt

Building Documentation

The documentation is built using Sphinx:

# Install Sphinx and dependencies
pip install sphinx sphinx-rtd-theme

# Build the documentation
cd source
make html

The documentation will be built in build/html/.

License

license

Citation

Chen, C., Fu, Z., Yang, J., Chen, H., Huang, J., Qin, S., Wang, C., & Hu, X. (2025). scFocus: Detecting Branching Probabilities in Single-cell Data with SAC. Computational and Structural Biotechnology Journal. https://doi.org/10.1016/j.csbj.2025.04.036

About

Single-cell reinforcement learning for lineage focusing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages