Skip to content

Feat prediction column remapping for instanovo backwards compatibility#191

Open
JemmaLDaniel wants to merge 3 commits intomainfrom
feat-prediction-column-remapping-for-instanovo-backwards-compatibility
Open

Feat prediction column remapping for instanovo backwards compatibility#191
JemmaLDaniel wants to merge 3 commits intomainfrom
feat-prediction-column-remapping-for-instanovo-backwards-compatibility

Conversation

@JemmaLDaniel
Copy link
Copy Markdown
Collaborator

@JemmaLDaniel JemmaLDaniel commented Apr 14, 2026

Summary

Adds configurable prediction column name mapping to InstaNovoDatasetLoader for backwards compatibility with older InstaNovo versions that write different CSV column headers. Also makes residue_remapping optional.

Key changes

  • column_mapping parameter: InstaNovoDatasetLoader accepts an optional column_mapping dict that maps logical column names (predictions, predictions_tokenised, log_probability) to the actual CSV column names. Defaults match the current InstaNovo output (predictions, predictions_tokenised, log_probs). Partial overrides are merged with defaults.
  • Improved error message: When required CSV columns are missing, the ValueError now suggests setting column_mapping in the data loader config.
  • Optional residue_remapping: InstaNovoDatasetLoader.residue_remapping parameter is now optional (defaults to None), simplifying loader construction when no residue remapping is needed.
  • Config: data_loader/instanovo.yaml updated with column_mapping block and documentation.

@JemmaLDaniel JemmaLDaniel self-assigned this Apr 14, 2026
@JemmaLDaniel JemmaLDaniel added the enhancement New feature or request label Apr 14, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 14, 2026

Coverage

Coverage Report
FileStmtsMissCoverMissing
__init__.py00100% 
data_types.py40100% 
calibration
   __init__.py00100% 
   calibration_features.py316797%247–248, 445, 734, 922, 926, 1224
   calibrator.py911583%69–70, 72, 106–109, 134–135, 137, 162–163, 167, 194–195
compat
   __init__.py00100% 
   instanovo.py10640%12, 14–15, 17, 24–25
datasets
   __init__.py00100% 
   calibration_dataset.py1091784%155, 169, 171, 173, 183, 196, 249, 251–252, 258–261, 263–266
   data_loaders.py2721395%23, 205, 236–237, 430, 867, 871, 920, 931, 1045–1046, 1082–1083
   interfaces.py30100% 
   psm_dataset.py250100% 
fdr
   __init__.py00100% 
   base.py581574%81, 85–86, 91, 98–99, 105, 126, 129–130, 135, 137–138, 144, 186
   database_grounded.py28196%52
   nonparametric.py25484%62, 68–69, 72
scripts
   __init__.py00100% 
   main.py1851850%8, 10–13, 16–20, 23–24, 26–28, 32, 39, 44, 47, 53, 55–56, 59, 68, 76, 79, 86, 88–90, 92, 94–99, 102, 104–105, 110, 125, 128, 135–141, 144–145, 148, 161–163, 166, 169, 174, 176–178, 180, 182–183, 186–187, 190, 192–193, 195, 197, 199–200, 202, 205–206, 209–210, 213–214, 217–219, 221, 224, 238–240, 242, 244, 249, 251–253, 255–256, 258–260, 265–266, 268–270, 272, 274, 276–277, 281–284, 286–287, 289–290, 292–293, 295, 298, 312–314, 317, 320, 325, 327–329, 331–333, 335–336, 339–340, 343, 345–346, 348, 350, 352–353, 355, 358–359, 365–366, 369–370, 373–374, 377–378, 386–388, 392, 395, 399, 402, 425, 438–439, 442, 464, 476–477, 480, 505, 518–519, 522, 537, 549–550, 553, 565, 577–578, 581, 596, 608–609
utils
   __init__.py40100% 
   config_formatter.py534024%29, 37–38, 40–42, 44, 55, 58–60, 62–63, 66–69, 72–74, 77–78, 80, 91, 102, 113, 127–128, 130–132, 145–147, 150, 153–154, 157–158, 160
   config_path.py76593%24–26, 117–118
   peptide.py160100% 
TOTAL127530875% 

Tests Skipped Failures Errors Time
299 0 💤 0 ❌ 0 🔥 38.059s ⏱️

@JemmaLDaniel JemmaLDaniel marked this pull request as ready for review April 14, 2026 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant