Add msproteomics test datasets by an-altosian · Pull Request #1952 · nf-core/test-datasets

an-altosian · 2026-03-25T16:21:36Z

Summary

Public test datasets for the nf-core/msproteomics pipeline.

TMT: PRIDE PXD000001 (Erwinia carotovora, TMT6) — 2 mzML subsets
DDA LFQ: Zenodo 1051552 (Human SILAC) — 2 mzML subsets
DIA: CPTAC CCRCC (Human DIA) — 1 mzML subset
FASTA: UniProt reference databases (Erwinia, Human SwissProt 2000-protein subset, E.coli+UPS1)
Module inputs: Pre-computed intermediate files for unit testing individual modules
Samplesheets: CSV inputs for all workflow test profiles

FASTA file sizes

File	Size	Proteins
`ecoli_ups1_test.fasta`	1.8 MB	—
`erwinia_carotovora.fasta`	1.6 MB	—
`erwinia_uniprot.fasta`	1.9 MB	—
`human_sp_subset.fasta`	1.4 MB	2,000

human_sp_subset.fasta is a smart subset: 169 proteins identified from the DDA LFQ test spectra (HEK SILAC) + 1,831 evenly-spaced SwissProt entries for search space diversity. Validated by running the full FragPipe DDA LFQ pipeline end-to-end (174 proteins identified at 1% FDR).

Supersedes #1946 (closed due to force-push history issue).

🤖 Generated with Claude Code

Public datasets for nf-core/msproteomics pipeline stub and integration testing: - TMT: PRIDE PXD000001 (Erwinia carotovora, TMT6) - 2 mzML subsets - DDA LFQ: Zenodo 1051552 (Human SILAC) - 2 mzML subsets - DIA: CPTAC CCRCC (Human DIA) - 1 mzML subset - FASTA: UniProt reference databases (Erwinia, Human SwissProt subset, E.coli+UPS1) - Module inputs: pre-computed intermediate files for unit testing individual modules - Samplesheets: CSV inputs for all workflow test profiles - Script: generate_test_subsets.sh for reproducible subset generation human_sp_subset.fasta contains 2000 proteins: 169 identified from DDA LFQ test spectra (HEK SILAC) + 1831 evenly-spaced entries for search space diversity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

an-altosian · 2026-03-25T16:22:30Z

Closing - wrong base branch. Will recreate targeting msproteomics branch.

an-altosian closed this Mar 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add msproteomics test datasets#1952

Add msproteomics test datasets#1952
an-altosian wants to merge 1 commit intonf-core:masterfrom
an-altosian:msproteomics

an-altosian commented Mar 25, 2026

Uh oh!

an-altosian commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

an-altosian commented Mar 25, 2026

Summary

FASTA file sizes

Uh oh!

an-altosian commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant