Skip to content

Adding subset of metagrootdb#1959

Merged
angelphanth merged 3 commits intonf-core:proteinannotatorfrom
angelphanth:add-metagroot-subset
Mar 30, 2026
Merged

Adding subset of metagrootdb#1959
angelphanth merged 3 commits intonf-core:proteinannotatorfrom
angelphanth:add-metagroot-subset

Conversation

@angelphanth
Copy link
Copy Markdown

Please fill in the appropriate checklist below (delete whatever is not relevant). These are the most common things requested when adding a new test dataset.

  • Check here that there isn't already a branch containing data that could be used
  • Fork the nf-core/test-datasets repository to your GitHub account
  • Create a new branch on your fork
  • Check your proposed test data follows the guidelines
  • Add your test dataset
    • If you clone it locally use git clone <url> --branch <branch> --single-branch
  • Make a PR on a new branch with a relevant name
  • Wait for the PR to be merged
  • Use this newly created branch for your tests

This pull request adds detailed documentation to the testdata/metagroot/README.md file, describing how to generate a test HMM dataset for MetagRoot. The new instructions clarify the process for obtaining, preparing, and packaging the test data.

Documentation improvements:

  • Added step-by-step instructions for downloading, extracting, concatenating, compressing, scanning, subsetting, and repackaging HMM profile files to create a test dataset, including example shell commands.
  • Clarified the difference between the HMM.tar.gz and subset_metagroot.hmm files, and provided references to the original data sources.

@angelphanth angelphanth requested a review from vagkaratzas March 30, 2026 11:47
Copy link
Copy Markdown

@vagkaratzas vagkaratzas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just do the required computations by hand and only provide the final test metagroot_test.hmm.gz file here, as in the other parallel folders (e.g., nmpfams, pfam, etc) :D

The deal is, you need to keep 1-2 matching HMMs from the mother file (which I see you already did with your scripts), so just push that one file :D

@angelphanth
Copy link
Copy Markdown
Author

Just do the required computations by hand and only provide the final test metagroot_test.hmm.gz file here, as in the other parallel folders (e.g., nmpfams, pfam, etc) :D

The deal is, you need to keep 1-2 matching HMMs from the mother file (which I see you already did with your scripts), so just push that one file :D

Thank you! I changed the file in the commit d1cbfac

@angelphanth angelphanth requested a review from vagkaratzas March 30, 2026 14:11
Copy link
Copy Markdown

@vagkaratzas vagkaratzas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, but please rename to metagroot_test.hmm.gz before merging (instead of metagrootdb_test.hmm.gz (nitpicking). And have fun with testing the subworkflow :D

@angelphanth angelphanth merged commit 57ceb0f into nf-core:proteinannotator Mar 30, 2026
@angelphanth angelphanth deleted the add-metagroot-subset branch March 30, 2026 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants