Skip to content

Commit 654854d

Browse files
Jessica RowellJessica Rowell
authored andcommitted
update docs formatting and add prod_submission and batch_size notes to submission guide
1 parent 030fae0 commit 654854d

2 files changed

Lines changed: 35 additions & 22 deletions

File tree

docs/user-guide/installation.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,10 @@ Create an NCBI Center Account. See [NCBI Center Account](general_NCBI_submission
6161

6262
Choose a workflow and specify your profile or (optionally, for annotation and GenBank submission) an `organism_Type` and `virus_subtype`. See: [Putting together the Nextflow command](submission_guide.md#putting-together-the-nextflow-command)
6363

64-
64+
**Please read** the [Submission Guide](submission_guide.md) for important details about parameters you need to specify. Please especially note the following:
65+
* `prod_submission`: true/false (for submitting to Test vs. Production server). See [Other customizations](submission_guide.md#other-customizations)
66+
* `batch_size`: for submitting large datasets in chunks. We **highly** recommend you submit using batches! See [Other customizations](submission_guide.md#other-customizations)
67+
* `workflow`: read how to use TOSTADAS for different types of submissions. See [Submitting to Production](submission_guide.md#submitting-to-production)
68+
* profiles: read about profile shortcuts in [Using specific profiles](submission_guide.md#using-specific-profiles)
6569

6670

docs/user-guide/submission_guide.md

Lines changed: 30 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
- [Choosing an organism type and/or virus subtype](#choosing-an-organism-type-andor-virus-subtype)
77
- [Using specific profiles](#using-specific-profiles)
88
- [Other customizations](#other-customizations)
9+
- [Submitting to Production](#submitting-to-production)
910
- [Typical example workflow](#typical-example-workflow)
1011
- [Submission config fields](#submission-config-fields)
1112
- [Custom metadata validation and custom BioSample package](#custom-metadata-validation-and-custom-biosample-package)
@@ -52,34 +53,42 @@ All the custom parameters for TOSTADAS are found in nextflow.config and the conf
5253

5354
For example, the default output directory is `results`, but you can override that and choose your own output directory using `--outdir path/to/my/output` in your command.
5455

55-
Another example: the `--dry_run` flag (which prepares files for submission but doesn't upload to the server) defaults to `true` for the test profile and `false` otherwise, but you can override it by specifying `--dry_run <true|false>` on the command line.
56-
57-
58-
59-
## Typical example workflow
60-
61-
We'll run test submissions to BioSample and SRA using the test MPOX data included in the repository.
56+
TOSTADAS can chunk large datasets into smaller groups to submit to NCBI's servers using the `--batch_size` flag. If you have a metadata Excel file with 200 samples, you can submit them in batches of 50 by adding `--batch_size 50` to your command. This groups 50 samples at a time into one submission file for each data repository. NCBI much prefers this over submitting samples one-at-a-time.
6257

63-
Submit to biosample and sra:
64-
`nextflow run main.nf -profile test,singularity,mpox --workflow biosample_and_sra --dry_run false --submission_config conf/submission_config.yaml --batch_size 5`
65-
**Remember** to add credentials to your submission_config.yaml file.
58+
We **highly** recommend you submit using batches!!! We suggest 50 as a maximum batch size.
6659

67-
Fetch the accessions if they weren’t assigned (this workflow creates an updated Metadata Excel file with the validated fields and the accession IDs):
68-
`nextflow run main.nf -profile test,singularity,mpox --workflow fetch_accessions --dry_run false --submission_config conf/submission_config.yaml`
60+
Another example: the `--dry_run` flag (which prepares files for submission but doesn't upload to the server) defaults to `true` for the test profile and `false` otherwise, but you can override it by specifying `--dry_run <true|false>` on the command line.
6961

70-
Submit an updated biosample submission (open the updated Excel file from results/mpxv_test_metadata/final_submission_outputs/mpxv_test_metadata_updated.xlsx and add some fake SAMN IDs first):
71-
`nextflow run main.nf -profile test,singularity --workflow update_submission --dry_run false --species mpxv --submission_config conf/submission_config.yaml --batch_size 5 --original_submission_outdir results/mpxv_test_metadata/submission_outputs --meta_path results/mpxv_test_metadata/final_submission_outputs/mpxv_test_metadata_updated.xlsx`
72-
**Remember** This won’t run without those fake SAMN IDs in the biosample_accession field.
62+
## Submitting to Production
7363

74-
Now we'll run a test GenBank submission using the test bacteria data included in the repository.
64+
TOSTADAS defaults to submitting to the test server even if not using the test profile, to avoid accidentally pushing data to NCBI's Production server.
7565

76-
Submit to BioSample first (because GenBank requires a BioSample accession):
77-
`nextflow run main.nf -profile test,singularity,bacteria --workflow biosample_and_sra --dry_run false --submission_config conf/submission_config.yaml`
66+
When you've completed testing and are ready to submit for production, add `--prod_submission` to your command line (or change `prod_submission` to `true` in `nextflow.config`).
7867

79-
Open the updated Excel file from results/bacteria_test_metadata_1/final_submission_outputs/bacteria_test_metadata_1_updated.xlsx and add some fake SAMN IDs first.
80-
**The next command won't run without the fake SAMN IDs in biosample_accession column**.
81-
`nextflow run main.nf -profile test,singularity,bacteria --workflow genbank --dry_run false --submission_config conf/submission_config.yaml --annotation --download_bakta_db --bakta_db_light`
68+
## Typical example workflow
8269

70+
We'll run test submissions to BioSample and SRA using the test MPOX data included in the repository.
71+
72+
Submit to biosample and sra:
73+
`nextflow run main.nf -profile test,singularity,mpox --workflow biosample_and_sra --dry_run false --submission_config conf/submission_config.yaml --batch_size 5`
74+
**Remember** to add credentials to your submission_config.yaml file.
75+
76+
Fetch the accessions if they weren’t assigned (this workflow creates an updated Metadata Excel file with the validated fields and the accession IDs):
77+
`nextflow run main.nf -profile test,singularity,mpox --workflow fetch_accessions --dry_run false --submission_config conf/submission_config.yaml`
78+
79+
Submit an updated biosample submission (open the updated Excel file from results/mpxv_test_metadata/final_submission_outputs/mpxv_test_metadata_updated.xlsx and add some fake SAMN IDs first):
80+
`nextflow run main.nf -profile test,singularity --workflow update_submission --dry_run false --species mpxv --submission_config conf/submission_config.yaml --batch_size 5 --original_submission_outdir results/mpxv_test_metadata/submission_outputs --meta_path results/mpxv_test_metadata/final_submission_outputs/mpxv_test_metadata_updated.xlsx`
81+
**Remember** This won’t run without those fake SAMN IDs in the biosample_accession field.
82+
83+
Now we'll run a test GenBank submission using the test bacteria data included in the repository.
84+
85+
Submit to BioSample first (because GenBank requires a BioSample accession):
86+
`nextflow run main.nf -profile test,singularity,bacteria --workflow biosample_and_sra --dry_run false --submission_config conf/submission_config.yaml`
87+
88+
Open the updated Excel file from results/bacteria_test_metadata_1/final_submission_outputs/bacteria_test_metadata_1_updated.xlsx and add some fake SAMN IDs first.
89+
**The next command won't run without the fake SAMN IDs in biosample_accession column**.
90+
`nextflow run main.nf -profile test,singularity,bacteria --workflow genbank --dry_run false --submission_config conf/submission_config.yaml --annotation --download_bakta_db --bakta_db_light`
91+
8392
## Submission config fields
8493

8594
The fields and corresponding example values can be found here: [Submission Config](https://github.com/CDCgov/tostadas/raw/master/conf/submission_config.yaml).

0 commit comments

Comments
 (0)