Conversation
subrepo: subdir: "shared/vendored" merged: "43e5a6f" upstream: origin: "https://github.com/nextstrain/shared" branch: "main" commit: "43e5a6f" git-subrepo: version: "0.4.6" origin: "https://github.com/ingydotnet/git-subrepo" commit: "110b9eb"
Removes the need to custom rules to download from AWS S3 and adds support for https, but requires Snakemake >=8.0.0.
|
Will merge this by EOD tomorrow if there are no comments. |
| build-args: | | ||
| --snakefile segment-focused/Snakefile \ | ||
| -pf test_target \ | ||
| --config 'inputs=[{"name": "example", "metadata": "example_data/gisaid/metadata.tsv", "sequences": "example_data/gisaid/sequences_{segment}.fasta"}]' |
There was a problem hiding this comment.
Should this also be updated in augur/docker-base/conda-base?
There was a problem hiding this comment.
Ah you're right! This change is necessary because the merged inputs have changed from data/metadata.tsv to results/metadata.tsv so the automatic copying of example data in pathogen-repo-ci.yaml@v0 no longer works.
Hm, I'll take this opportunity to make it a separate config file to make future changes easier.
There was a problem hiding this comment.
Hmm, moving this into a separate config file has made it apparent that the local inputs will need to be wrapped in resolve_config_path somewhere.
There was a problem hiding this comment.
Hmm, moving this into a separate config file has made it apparent that the local inputs will need to be wrapped in resolve_config_path somewhere.
For running within the repo or from external analysis directories? I'm assuming the latter, as this has come up a bit both for local ingest results and in the context of storing USVI Zika data within the repo. I don't think we need to support this. (If it's not working running within the repo, can you say more?)
There was a problem hiding this comment.
Added separate CI build in 426f8df.
For running within the repo or from external analysis directories?
From within the repo, it only works if you run the CI workflow from the top level directory, e.g. nextstrain build . -s build-configs/ci/Snakefile is fine. However, running from within the nested directory does not work:
$ cd build-configs/ci
$ nextstrain build .
Assuming unrestricted shared filesystem usage.
host: 457910275e93
Building DAG of jobs...
MissingInputException in rule merge_sequences in file "/nextstrain/build/rules/merge_inputs.smk", line 126:
Missing input files for rule merge_sequences:
output: results/sequences_ha.fasta
wildcards: segment=ha
affected files:
example_data/gisaid/sequences_ha.fastaThis definitely does not work for external analysis directories.
There was a problem hiding this comment.
Yeah, I'm in agreement here that we probably don't need to support running the CI build from external directories.
Since this is no longer needed, I've removed the additional Snakefile in the latest force-push and just left the config.yaml which can be used in the CI.
There was a problem hiding this comment.
Also realized I never followed up on this, I'll make the corresponding changes in those 3 repos.
Downstream PRs for updating CI workflows:
There was a problem hiding this comment.
this does make the
inputs/additional_inputsparams different from other config params that expect file paths: they must always be relative to the working directory.
That's a subtlety of path_or_url, not inputs/additional_inputs per-se. We could add a path-resolver argument to the function and/or an explicit check that the requested local file path exists. (The reason we don't do this is that we can use the function to create / write files, and perhaps some other complexity that's slipping my mind currently.) Alternatively, this goal could be achieved by supplying search-paths to the consuming augur command.
There was a problem hiding this comment.
Alternatively, this goal could be achieved by supplying search-paths to the consuming
augurcommand.
There was a problem hiding this comment.
Thanks @jameshadfield, I'm continuing the idea of adding path resolver to path_or_url in nextstrain/shared#65. That can be tackled in future changes and pulled into this repo later.
I think this is different than nextstrain/augur#1897 since the inputs are threaded through Snakemake inputs so they have to be resolved as relative paths to the working dir.
Modified from the dynamic rules in the pathogen-repo-guide.¹ The dynamic rules did not work in this repo because it allows different number of inputs per segment wildcard. There's no good way to define different rules per wildcard so I've opted to use if/else blocks in the shell to run either `augur read-file` or `augur merge` depending on the number of inputs. ¹ <https://github.com/nextstrain/pathogen-repo-guide/blob/cb87e1ac0e6c98d08bab76b954309aeb3bae889b/phylogenetic/rules/merge_inputs.smk>
Can you touch base with Louise's group re: snakemake v8 (if you haven't already!) |
d70d789 to
426f8df
Compare
jameshadfield
left a comment
There was a problem hiding this comment.
Thanks for moving this repo forward Jover! I haven't tested but I can do so if you'd like? (after fixing Victor's comments)
426f8df to
fc4a6f4
Compare
The main motivation for this is to simplify the CI build-args which will need to be replicated across augur, docker-base, and conda-base. Note the CI config uses the repo's local example_data as inputs and the paths are relative to the top level directory, so the workflow can only be run from the top level directory.
fc4a6f4 to
154f3c9
Compare
Matching changes in ci builds-args in nextstrain/avian-flu#156
Matching changes in ci builds-args in nextstrain/avian-flu#156
Matching changes in ci builds-args in nextstrain/avian-flu#156
Ah, realized I never followed up here. I chatted with Louise about the change on Slack and it should be fine with their group because they are using the Nextstrain Docker image. |
|
I believe I've addressed all open comments in this PR. I'll merge this by EOD today if there are no other comments. |
Description of proposed changes
Use latest guidance for handling inputs based on nextstrain/pathogen-repo-guide#91.
remote_inputs.smk, which bumps minimum Snakemake to 8.0.0augur mergefor sequencesaugur read-fileoraugur mergedepending on number of inputs (see f5c9949)The config parameters remain unchanged so this should not affect outside users. Downstream rules are now expected to use "results/metadata.tsv" and "results/sequences_{segment}.fasta" as inputs instead of the input_* functions.
Related issue(s)
nextstrain/public#25
Checklist
Post-merge: