Skip to content

CahanLab/xcell

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

XCell

Interactive web application for exploring and analyzing scRNA-seq and spatial transcriptomics data. Load an h5ad, 10x Genomics h5, Seurat .rds file, 10x CellRanger matrix folder, or prefixed 10x file trio from GEO, visualize cells on a scatter plot, run Scanpy analysis pipelines, and explore results — all from your browser.

Screenshot

Quick Start

Prerequisites

  • Python 3.9+
  • Node.js 18+
  • R with Seurat and SeuratDisk packages (optional, required for loading .rds files)

Backend Setup

cd xcell/backend

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in editable mode
pip install -e .

Frontend Setup

cd xcell/frontend
npm install

Launch

# Terminal 1: Start the backend (from xcell/backend/)
uvicorn xcell.main:app --reload

# Terminal 2: Start the frontend (from xcell/frontend/)
npm run dev

Open http://localhost:5173 in your browser.

A bundled toy dataset (toy_spatial.h5ad) loads automatically if no data path is specified. To load your own data, set the XCELL_DATA_PATH environment variable:

XCELL_DATA_PATH=/path/to/your/data.h5ad uvicorn xcell.main:app --reload  # also supports .h5 and .rds

Getting Started with Toy Data

The included test_data/toy_spatial.h5ad dataset is a small spatial transcriptomics dataset for exploring XCell's features. Here's a step-by-step walkthrough:

1. Explore the Scatter Plot

  • Pan by clicking and dragging
  • Zoom with scroll wheel
  • Cells are rendered as points at their spatial coordinates

2. Color by Metadata

  • Open Cell Manager (left panel)
  • Select a metadata column to color cells by that annotation

3. Select Cells

  • Click the Select button in the toolbar (use the dropdown arrow to choose between Lasso and Polygon tools)
    • Lasso: click and drag to draw a freehand selection
    • Polygon: click to add vertices, double-click to close and select cells inside
  • Hold Shift while selecting to add to the existing selection
  • Checkboxes in the Cell Manager also select/deselect cells by category
  • Selected cells can be masked or deleted

4. Run Preprocessing

  • Open the Scanpy modal (top toolbar)
  • Go to Preprocessing and run in order:
    1. Normalize Total — normalize counts per cell
    2. Log1p — log-transform the data
    3. Highly Variable Genes — identify informative genes

5. Run Cell Analysis

  • In the Scanpy modal, go to Cell Analysis and run in order:
    1. PCA — reduce dimensionality
    2. Neighbors — build cell neighborhood graph (requires PCA)
    3. UMAP — compute 2D embedding (requires Neighbors)
    4. Leiden — cluster cells (requires Neighbors)

6. View Clustering Results

  • In Cell Manager, select the leiden column to color by cluster
  • Switch the embedding to X_umap to see the UMAP layout

7. Color by Gene Expression

  • Open Gene Manager (right panel)
  • If the dataset has alternative gene identifier columns (e.g., gene symbols alongside Ensembl IDs), use the Gene IDs dropdown at the top of the panel to switch
  • Search or browse genes
  • Click a gene to color cells by its expression

8. Gene Sets

  • Create gene sets manually in Gene Manager
  • Import gene lists from files

9. Compare Cell Groups

  • Open the Analyze modal (top toolbar) → Cell AnalysisCompare Cells
  • Select an .obs column (e.g., leiden) from the dropdown
  • Check 2 or more groups to compare:
    • 2 checked → pairwise differential expression
    • 3+ checked → one-vs-rest marker gene analysis
  • Set Top N genes and click Run
  • You can also use lasso selection: select cells → Set as Group 1 / Set as Group 2 → click Compare in the comparison bar

10. Trajectory Analysis

  • Draw lines on the scatter plot
  • Click the gear icon on a shape in the Shapes panel to open Line Tools
  • Under Gene Association, configure:
    • Test against: position along line or distance from line
    • Gene subset: filter to highly variable genes or other boolean columns
    • Spline knots: number of interior knots for the B-spline model (default 5; higher = more flexible fit)
    • FDR: significance threshold (default 0.05)
    • Max genes/module: cap on genes returned per expression module
  • Click Find Associated Genes to run the analysis
  • In the results modal, use the Filters bar to refine results interactively: adjust min R², min amplitude, max FDR, or toggle pattern types (increasing, decreasing, peak, trough, complex)

Multi-section / replicate analysis

  • Draw a line on each tissue section representing the same biological axis
  • For each line, select cells (via lasso or clicking a category value in the Cells panel) and click + to associate them with the line
  • Check the lines to include using the checkboxes that appear on lines with projected cells
  • Click Find Associated Genes in the action bar
  • In the multi-line modal, toggle direction per line if needed (arrow button) and set analysis parameters
  • Results pool cells across all lines for a single, higher-powered analysis

11. Run Gene Analysis

  • In the Scanpy modal, go to Gene Analysis:
    1. Build Gene Graph — compute gene-gene similarity
    2. Cluster Genes — group genes by expression pattern

12. Spatial Contouring

  • Select genes in the Gene Panel (click individual genes or use a gene set)
  • Open the Scanpy modal, go to Spatial Analysis > Contourize
  • Adjust smoothing sigma, contour levels, and grid resolution as needed
  • Click Run — a new categorical column appears in the Cell Panel
  • Color cells by the contour column to visualize spatial expression zones

13. Load a Second Dataset

  • Click Load in the toolbar — the modal shows a sidebar with quick-access locations (Home, Desktop, Documents, Downloads) and recently loaded files, plus breadcrumb path navigation for clicking any ancestor directory
  • Choose Secondary from the "Load into" dropdown
  • Browse or enter the path to a second h5ad, h5, rds file, 10x matrix folder, or prefixed 10x file trio and click Load
  • A dataset switcher dropdown appears in the header — switch between Primary and Secondary to compare datasets
  • Click the Split button to view both datasets side by side
  • Click on either plot to make it the active dataset — the Cell and Gene panels update accordingly
  • Each plot has its own embedding selector, legend, and independent pan/zoom

14. Export Results

  • Click Export in the toolbar to download annotations and results

Features

  • Interactive scatter plot — deck.gl-powered visualization with pan, zoom, lasso selection
  • Cell Manager — browse/color by metadata, mask/delete cells
  • Gene Manager — search genes, create gene sets, import gene lists
  • Scanpy integration — run preprocessing, cell analysis (PCA, Neighbors, UMAP, Leiden), gene analysis, spatial analysis (contourize), and differential expression directly in the browser. Long-running operations (gene neighbors, spatial neighbors, spatial autocorrelation, contourize, line gene association) can be cancelled mid-run without corrupting session data.
  • Trajectory analysis — draw lines and associate genes with spatial trajectories
  • Quilt mode — lasso and rearrange tissue pieces: drag to translate, shift+drag to rotate, flip to reflect selected cell subsets
  • Display settings — adjust point size, opacity, colormaps, bivariate coloring
  • Multi-dataset support — load two datasets (h5ad, h5, rds, 10x matrix folders, or prefixed 10x file trios from GEO), switch between them, or view side by side in split mode
  • Export — download annotations and analysis results

Project Structure

xcell/
├── backend/
│   ├── xcell/
│   │   ├── main.py          # FastAPI app entry point
│   │   ├── adaptor.py       # DataAdaptor class (wraps AnnData)
│   │   ├── diffexp.py       # Differential expression
│   │   ├── data/
│   │   │   └── toy_spatial.h5ad  # Bundled toy dataset
│   │   └── api/
│   │       └── routes.py    # REST API endpoints
│   └── pyproject.toml       # Python dependencies
├── frontend/
│   ├── src/
│   │   ├── App.tsx           # Main app component
│   │   ├── store.ts          # Zustand state management
│   │   ├── main.tsx          # Entry point
│   │   ├── components/
│   │   │   ├── ScatterPlot.tsx        # deck.gl scatter plot
│   │   │   ├── CellPanel.tsx          # Cell metadata manager
│   │   │   ├── GenePanel.tsx          # Gene browser / gene sets
│   │   │   ├── ScanpyModal.tsx        # Scanpy analysis pipeline UI
│   │   │   ├── DiffExpModal.tsx       # Differential expression
│   │   │   ├── LineAssociationModal.tsx # Trajectory analysis
│   │   │   ├── DisplaySettings.tsx    # Visualization settings
│   │   │   ├── ShapeManager.tsx       # Shape/selection tools
│   │   │   └── ImportModal.tsx        # Gene list import
│   │   └── hooks/
│   │       └── useData.ts    # Data fetching hooks
│   ├── package.json          # Node dependencies
│   └── vite.config.ts        # Vite configuration
├── README.md
test_data/
├── toy_spatial.h5ad          # Toy dataset for testing
└── generate_toy.py           # Script to regenerate toy data

Architecture

  • Backend: FastAPI + AnnData + Scanpy, serving data and running analysis via REST API
  • Frontend: React + TypeScript + Vite + deck.gl + Zustand for state management
  • Data flow: h5ad file → DataAdaptor → REST API → React hooks → deck.gl visualization
  • API docs: Available at http://localhost:8000/docs when the backend is running

About

web app for analysis and visualization of spatial transcriptomics (ST) and single cell RNA-seq (scRNA-seq) data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors