Skip to content

update in-file paths to commit specific paths#4

Open
kristinebilgrav wants to merge 3 commits intooncorefinerfrom
fix-file-paths
Open

update in-file paths to commit specific paths#4
kristinebilgrav wants to merge 3 commits intooncorefinerfrom
fix-file-paths

Conversation

@kristinebilgrav
Copy link
Copy Markdown

@kristinebilgrav kristinebilgrav commented May 7, 2026

Description

Closes https://github.com/Clinical-Genomics/MTP-oncoflow/issues/50

Changed

  • Commit specific file paths in files vep_files.csv, vcfanno_resources.txt and svdb_querydb_files.csv

Test

  • tested and successfully runs locally.

Review

  • Tests executed by
  • "Merge and deploy" approved by
    Thanks for filling in who performed the code review and the test!

This version is a

  • MAJOR - when you make incompatible API changes
  • MINOR - when you add functionality in a backwards compatible manner
  • PATCH - when you make backwards compatible bug fixes or documentation/instructions

Implementation Plan

  • Document in ...
  • Deploy this branch on ...
  • Inform to ...

@kristinebilgrav kristinebilgrav marked this pull request as ready for review May 7, 2026 07:39
@kristinebilgrav kristinebilgrav requested a review from mathiasbio May 7, 2026 07:39
Copy link
Copy Markdown

@mathiasbio mathiasbio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! 🌟

After thinking about it a bit however, I'm not a hundred percent certain about this. I don't know if I understand all of the design choices about using a commit in the test-data-path place instead of just using dev.

I understand it means tests will be broken if things are updated in the datasets. But might it not be better, when updating the test-datasets to have a parallel branch in oncorefiner to update the tests.

Then in the feature-branch we just pull in dev and the tests should be fixed? 🤔 But maybe I'm missing something.

I just wonder that if we add the commit here, there's a chance we will lose track of it, and run into problems down the line, thinking we're using one file but actually using another.

Maybe it would be good to get @fevac and @beatrizsavinhas eyes here too 👀

@beatrizsavinhas
Copy link
Copy Markdown

beatrizsavinhas commented May 7, 2026

Yeah... I was thinking that a solution like this will also add a lot of overhang that would be hard to maintain. So whenever we update testdata, we not only have to update the testdaset paths in several pipeline files but will also have to remember to update all in-file paths like what is done in this PR. This solution works, but is harder to maintain and is error/forgetfulness prone with the risk of down the line using the wrong files by accident, like you say @mathiasbio.

Perhaps I am actually for not making the pipelines_testdata_base_path commit specific. 🤔
Then we should make it a requirement that, in order to merge changes to the test-datasets repo, you have to ensure the corresponding pipeline tests are passing and, if not, create a parallel branch to update the pipeline tests.

@kristinebilgrav
Copy link
Copy Markdown
Author

We discussed this a bit earlier today, and the risk with not having pipelines_testdata_base_path commit specific is that running earlier versions of OR in the future will not work, or ex main vs dev branch will be out of sync for the tests.

I do agree that it will add complexity and risk when updating the test-dataset to have commit specific in-file paths, but I cannot come up with a better solution right now. 🤔 I dont think we need to be update the paths with every update to the dataset unless it affects those files. But of course it is still a risk. Maybe we need to keep in mind to update the in-file paths of those files when updating them. Could we add a checklist to this repo to remember to check those things?

@mathiasbio
Copy link
Copy Markdown

Hmm probably easier to talk which you've maybe already done! 😂 BUT! Yes...I understand that if we tie the test-path to the dev branch, this will make tests fail automatically in old versions of oncorefiner. But silly question probably, do we care if they fail? 😂 I feel like old versions are just locked and if we want to run them we just run them, and we don't care about tests then, but I'm probably missing something!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants