Standardized multiple inputs by joverlee521 · Pull Request #156 · nextstrain/avian-flu

joverlee521 · 2025-09-26T00:40:42Z

Description of proposed changes

Use latest guidance for handling inputs based on nextstrain/pathogen-repo-guide#91.

use shared remote_inputs.smk, which bumps minimum Snakemake to 8.0.0
use augur merge for sequences
use augur read-file or augur merge depending on number of inputs (see f5c9949)

The config parameters remain unchanged so this should not affect outside users. Downstream rules are now expected to use "results/metadata.tsv" and "results/sequences_{segment}.fasta" as inputs instead of the input_* functions.

Related issue(s)

nextstrain/public#25

Checklist

Checks pass

Post-merge:

subrepo: subdir: "shared/vendored" merged: "43e5a6f" upstream: origin: "https://github.com/nextstrain/shared" branch: "main" commit: "43e5a6f" git-subrepo: version: "0.4.6" origin: "https://github.com/ingydotnet/git-subrepo" commit: "110b9eb"

Removes the need to custom rules to download from AWS S3 and adds support for https, but requires Snakemake >=8.0.0.

joverlee521 · 2025-10-07T00:34:45Z

Will merge this by EOD tomorrow if there are no comments.

victorlin · 2025-10-07T17:15:26Z

+      build-args: |
+        --snakefile segment-focused/Snakefile \
+        -pf test_target  \
+        --config 'inputs=[{"name": "example", "metadata": "example_data/gisaid/metadata.tsv", "sequences": "example_data/gisaid/sequences_{segment}.fasta"}]'


Should this also be updated in augur/docker-base/conda-base?

Ah you're right! This change is necessary because the merged inputs have changed from data/metadata.tsv to results/metadata.tsv so the automatic copying of example data in pathogen-repo-ci.yaml@v0 no longer works.

Hm, I'll take this opportunity to make it a separate config file to make future changes easier.

Hmm, moving this into a separate config file has made it apparent that the local inputs will need to be wrapped in resolve_config_path somewhere.

Hmm, moving this into a separate config file has made it apparent that the local inputs will need to be wrapped in resolve_config_path somewhere.

For running within the repo or from external analysis directories? I'm assuming the latter, as this has come up a bit both for local ingest results and in the context of storing USVI Zika data within the repo. I don't think we need to support this. (If it's not working running within the repo, can you say more?)

Added separate CI build in 426f8df.

For running within the repo or from external analysis directories?

From within the repo, it only works if you run the CI workflow from the top level directory, e.g. nextstrain build . -s build-configs/ci/Snakefile is fine. However, running from within the nested directory does not work:

$ cd build-configs/ci $ nextstrain build . Assuming unrestricted shared filesystem usage. host: 457910275e93 Building DAG of jobs... MissingInputException in rule merge_sequences in file "/nextstrain/build/rules/merge_inputs.smk", line 126: Missing input files for rule merge_sequences: output: results/sequences_ha.fasta wildcards: segment=ha affected files: example_data/gisaid/sequences_ha.fasta

This definitely does not work for external analysis directories.

Yeah, I'm in agreement here that we probably don't need to support running the CI build from external directories.

Since this is no longer needed, I've removed the additional Snakefile in the latest force-push and just left the config.yaml which can be used in the CI.

Also realized I never followed up on this, I'll make the corresponding changes in those 3 repos.

Downstream PRs for updating CI workflows:

ci: Update build-args for avian-flu augur#1924

ci: Update build-args for avian-flu docker-base#271

ci: Update build-args for avian-flu conda-base#133

this does make the inputs/additional_inputs params different from other config params that expect file paths: they must always be relative to the working directory.

That's a subtlety of path_or_url, not inputs/additional_inputs per-se. We could add a path-resolver argument to the function and/or an explicit check that the requested local file path exists. (The reason we don't do this is that we can use the function to create / write files, and perhaps some other complexity that's slipping my mind currently.) Alternatively, this goal could be achieved by supplying search-paths to the consuming augur command.

Alternatively, this goal could be achieved by supplying search-paths to the consuming augur command.

nextstrain/augur#1897

Thanks @jameshadfield, I'm continuing the idea of adding path resolver to path_or_url in nextstrain/shared#65. That can be tackled in future changes and pulled into this repo later.

I think this is different than nextstrain/augur#1897 since the inputs are threaded through Snakemake inputs so they have to be resolved as relative paths to the working dir.

Modified from the dynamic rules in the pathogen-repo-guide.¹ The dynamic rules did not work in this repo because it allows different number of inputs per segment wildcard. There's no good way to define different rules per wildcard so I've opted to use if/else blocks in the shell to run either `augur read-file` or `augur merge` depending on the number of inputs. ¹ <https://github.com/nextstrain/pathogen-repo-guide/blob/cb87e1ac0e6c98d08bab76b954309aeb3bae889b/phylogenetic/rules/merge_inputs.smk>

jameshadfield · 2025-10-07T19:19:43Z

The config parameters remain unchanged so this should not affect outside users.

[this PR] bumps minimum Snakemake to 8.0.0

Can you touch base with Louise's group re: snakemake v8 (if you haven't already!)

jameshadfield

Thanks for moving this repo forward Jover! I haven't tested but I can do so if you'd like? (after fixing Victor's comments)

The main motivation for this is to simplify the CI build-args which will need to be replicated across augur, docker-base, and conda-base. Note the CI config uses the repo's local example_data as inputs and the paths are relative to the top level directory, so the workflow can only be run from the top level directory.

Matching changes in ci builds-args in nextstrain/avian-flu#156

joverlee521 · 2025-10-28T20:48:47Z

Can you touch base with Louise's group re: snakemake v8 (if you haven't already!)

Ah, realized I never followed up here. I chatted with Louise about the change on Slack and it should be fine with their group because they are using the Nextstrain Docker image.

joverlee521 · 2025-10-28T20:51:29Z

I believe I've addressed all open comments in this PR. I'll merge this by EOD today if there are no other comments.

joverlee521 added 5 commits September 24, 2025 16:45

Move input handling to separate merge_inputs.smk

40bcee8

rules/merge_inputs: Use augur merge for sequences

d7b37ed

rules/merge_inputs: Switch to shared remote file support

47012ed

Removes the need to custom rules to download from AWS S3 and adds support for https, but requires Snakemake >=8.0.0.

CI: use example data only

2bbd92d

joverlee521 mentioned this pull request Sep 24, 2025

Add/update multiple input support for existing pathogens nextstrain/public#25

Open

19 tasks

victorlin reviewed Oct 7, 2025

View reviewed changes

joverlee521 force-pushed the standardized-multiple-inputs branch from d70d789 to 426f8df Compare October 7, 2025 20:28

jameshadfield reviewed Oct 7, 2025

View reviewed changes

Comment thread rules/merge_inputs.smk

Comment thread rules/merge_inputs.smk

Add docstring for merge_inputs.smk

e5e6b89

joverlee521 force-pushed the standardized-multiple-inputs branch from 426f8df to fc4a6f4 Compare October 7, 2025 21:43

joverlee521 force-pushed the standardized-multiple-inputs branch from fc4a6f4 to 154f3c9 Compare October 28, 2025 20:07

joverlee521 added a commit to nextstrain/augur that referenced this pull request Oct 28, 2025

ci: Update build-args for avian-flu

86f34ad

Matching changes in ci builds-args in nextstrain/avian-flu#156

joverlee521 added a commit to nextstrain/docker-base that referenced this pull request Oct 28, 2025

ci: Update build-args for avian-flu

4fb43db

Matching changes in ci builds-args in nextstrain/avian-flu#156

joverlee521 added a commit to nextstrain/conda-base that referenced this pull request Oct 28, 2025

ci: Update build-args for avian-flu

60261e6

Matching changes in ci builds-args in nextstrain/avian-flu#156

This was referenced Oct 28, 2025

ci: Update build-args for avian-flu nextstrain/augur#1924

Merged

ci: Update build-args for avian-flu nextstrain/docker-base#271

Merged

ci: Update build-args for avian-flu nextstrain/conda-base#133

Merged

joverlee521 mentioned this pull request Oct 28, 2025

Add path resolver to path_or_url nextstrain/shared#65

Open

joverlee521 merged commit f963447 into master Oct 28, 2025
6 checks passed

joverlee521 deleted the standardized-multiple-inputs branch October 28, 2025 23:32

Conversation

joverlee521 commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of proposed changes

Related issue(s)

Checklist

Uh oh!

joverlee521 commented Oct 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jameshadfield commented Oct 7, 2025

Uh oh!

jameshadfield left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

joverlee521 commented Oct 28, 2025

Uh oh!

joverlee521 commented Oct 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

joverlee521 commented Sep 26, 2025 •

edited

Loading