Skip to content

174 feat add mgf file support for mztab data loader#178

Open
JemmaLDaniel wants to merge 2 commits intomainfrom
174-feat-add-mgf-file-support-for-mztab-data-loader
Open

174 feat add mgf file support for mztab data loader#178
JemmaLDaniel wants to merge 2 commits intomainfrom
174-feat-add-mgf-file-support-for-mztab-data-loader

Conversation

@JemmaLDaniel
Copy link
Copy Markdown
Collaborator

Summary

  • Extends MZTabDatasetLoader to accept .mgf spectrum files (via matchms) alongside the existing .parquet and .ipc formats
  • Adds an add_index_cols parameter (defaulting to true in mztab.yaml) that generates experiment_name and spectrum_id columns, enabling downstream features that require spectrum_id
  • Extracts shared MGF parsing helpers (_df_from_matchms, _add_index_cols) from InstaNovoDatasetLoader into module-level functions reused by both loaders

The generated spectrum_id ({file_stem}:{row_index}) aligns with the 0-based index=N extracted from MZTab spectra_ref, so Casanovo and other MZTab-producing engines are correctly mapped to their input spectra without changing the existing join semantics.

@JemmaLDaniel JemmaLDaniel linked an issue Apr 4, 2026 that may be closed by this pull request
@JemmaLDaniel JemmaLDaniel self-assigned this Apr 4, 2026
@JemmaLDaniel JemmaLDaniel added the enhancement New feature or request label Apr 4, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 4, 2026

Coverage

Coverage Report
FileStmtsMissCoverMissing
__init__.py00100% 
data_types.py40100% 
calibration
   __init__.py00100% 
   calibration_features.py316797%247–248, 443, 728, 916, 920, 1215
   calibrator.py911583%69–70, 72, 106–109, 134–135, 137, 162–163, 167, 194–195
compat
   __init__.py00100% 
   instanovo.py10640%12, 14–15, 17, 24–25
datasets
   __init__.py00100% 
   calibration_dataset.py1091784%155, 169, 171, 173, 183, 196, 249, 251–252, 258–261, 263–266
   data_loaders.py2761494%23, 190, 221–222, 415, 456, 869, 873, 922, 933, 1045–1046, 1074–1075
   interfaces.py30100% 
   psm_dataset.py250100% 
fdr
   __init__.py00100% 
   base.py581574%81, 85–86, 91, 98–99, 105, 126, 129–130, 135, 137–138, 144, 186
   database_grounded.py28196%52
   nonparametric.py25484%62, 68–69, 72
scripts
   __init__.py00100% 
   main.py1851850%8, 10–13, 16–20, 23–24, 26–28, 32, 39, 44, 47, 53, 55–56, 59, 68, 76, 79, 86, 88–90, 92, 94–99, 102, 104–105, 110, 125, 128, 135–141, 144–145, 148, 161–163, 166, 169, 174, 176–178, 180, 182–183, 186–187, 190, 192–193, 195, 197, 199–200, 202, 205–206, 209–210, 213–214, 217–219, 221, 224, 238–240, 242, 244, 249, 251–253, 255–256, 258–260, 265–266, 268–270, 272, 274, 276–277, 281–284, 286–287, 289–290, 292–293, 295, 298, 312–314, 317, 320, 325, 327–329, 331–333, 335–336, 339–340, 343, 345–346, 348, 350, 352–353, 355, 358–359, 365–366, 369–370, 373–374, 377–378, 386–388, 392, 395, 399, 402, 425, 438–439, 442, 464, 476–477, 480, 505, 518–519, 522, 537, 549–550, 553, 565, 577–578, 581, 596, 608–609
utils
   __init__.py40100% 
   config_formatter.py534024%29, 37–38, 40–42, 44, 55, 58–60, 62–63, 66–69, 72–74, 77–78, 80, 91, 102, 113, 127–128, 130–132, 145–147, 150, 153–154, 157–158, 160
   config_path.py76593%24–26, 117–118
   peptide.py160100% 
TOTAL127930975% 

Tests Skipped Failures Errors Time
301 0 💤 0 ❌ 0 🔥 35.571s ⏱️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add mgf file support for MZTab data loader

1 participant