|
6 | 6 | - [Choosing an organism type and/or virus subtype](#choosing-an-organism-type-andor-virus-subtype) |
7 | 7 | - [Using specific profiles](#using-specific-profiles) |
8 | 8 | - [Other customizations](#other-customizations) |
| 9 | +- [Submitting to Production](#submitting-to-production) |
9 | 10 | - [Typical example workflow](#typical-example-workflow) |
10 | 11 | - [Submission config fields](#submission-config-fields) |
11 | 12 | - [Custom metadata validation and custom BioSample package](#custom-metadata-validation-and-custom-biosample-package) |
@@ -52,34 +53,42 @@ All the custom parameters for TOSTADAS are found in nextflow.config and the conf |
52 | 53 |
|
53 | 54 | For example, the default output directory is `results`, but you can override that and choose your own output directory using `--outdir path/to/my/output` in your command. |
54 | 55 |
|
55 | | -Another example: the `--dry_run` flag (which prepares files for submission but doesn't upload to the server) defaults to `true` for the test profile and `false` otherwise, but you can override it by specifying `--dry_run <true|false>` on the command line. |
56 | | - |
57 | | - |
58 | | - |
59 | | -## Typical example workflow |
60 | | - |
61 | | -We'll run test submissions to BioSample and SRA using the test MPOX data included in the repository. |
| 56 | +TOSTADAS can chunk large datasets into smaller groups to submit to NCBI's servers using the `--batch_size` flag. If you have a metadata Excel file with 200 samples, you can submit them in batches of 50 by adding `--batch_size 50` to your command. This groups 50 samples at a time into one submission file for each data repository. NCBI much prefers this over submitting samples one-at-a-time. |
62 | 57 |
|
63 | | -Submit to biosample and sra: |
64 | | -`nextflow run main.nf -profile test,singularity,mpox --workflow biosample_and_sra --dry_run false --submission_config conf/submission_config.yaml --batch_size 5` |
65 | | -**Remember** to add credentials to your submission_config.yaml file. |
| 58 | +We **highly** recommend you submit using batches!!! We suggest 50 as a maximum batch size. |
66 | 59 |
|
67 | | -Fetch the accessions if they weren’t assigned (this workflow creates an updated Metadata Excel file with the validated fields and the accession IDs): |
68 | | -`nextflow run main.nf -profile test,singularity,mpox --workflow fetch_accessions --dry_run false --submission_config conf/submission_config.yaml` |
| 60 | +Another example: the `--dry_run` flag (which prepares files for submission but doesn't upload to the server) defaults to `true` for the test profile and `false` otherwise, but you can override it by specifying `--dry_run <true|false>` on the command line. |
69 | 61 |
|
70 | | -Submit an updated biosample submission (open the updated Excel file from results/mpxv_test_metadata/final_submission_outputs/mpxv_test_metadata_updated.xlsx and add some fake SAMN IDs first): |
71 | | -`nextflow run main.nf -profile test,singularity --workflow update_submission --dry_run false --species mpxv --submission_config conf/submission_config.yaml --batch_size 5 --original_submission_outdir results/mpxv_test_metadata/submission_outputs --meta_path results/mpxv_test_metadata/final_submission_outputs/mpxv_test_metadata_updated.xlsx` |
72 | | -**Remember** This won’t run without those fake SAMN IDs in the biosample_accession field. |
| 62 | +## Submitting to Production |
73 | 63 |
|
74 | | -Now we'll run a test GenBank submission using the test bacteria data included in the repository. |
| 64 | +TOSTADAS defaults to submitting to the test server even if not using the test profile, to avoid accidentally pushing data to NCBI's Production server. |
75 | 65 |
|
76 | | -Submit to BioSample first (because GenBank requires a BioSample accession): |
77 | | -`nextflow run main.nf -profile test,singularity,bacteria --workflow biosample_and_sra --dry_run false --submission_config conf/submission_config.yaml` |
| 66 | +When you've completed testing and are ready to submit for production, add `--prod_submission` to your command line (or change `prod_submission` to `true` in `nextflow.config`). |
78 | 67 |
|
79 | | -Open the updated Excel file from results/bacteria_test_metadata_1/final_submission_outputs/bacteria_test_metadata_1_updated.xlsx and add some fake SAMN IDs first. |
80 | | -**The next command won't run without the fake SAMN IDs in biosample_accession column**. |
81 | | -`nextflow run main.nf -profile test,singularity,bacteria --workflow genbank --dry_run false --submission_config conf/submission_config.yaml --annotation --download_bakta_db --bakta_db_light` |
| 68 | +## Typical example workflow |
82 | 69 |
|
| 70 | +We'll run test submissions to BioSample and SRA using the test MPOX data included in the repository. |
| 71 | + |
| 72 | +Submit to biosample and sra: |
| 73 | +`nextflow run main.nf -profile test,singularity,mpox --workflow biosample_and_sra --dry_run false --submission_config conf/submission_config.yaml --batch_size 5` |
| 74 | +**Remember** to add credentials to your submission_config.yaml file. |
| 75 | + |
| 76 | +Fetch the accessions if they weren’t assigned (this workflow creates an updated Metadata Excel file with the validated fields and the accession IDs): |
| 77 | +`nextflow run main.nf -profile test,singularity,mpox --workflow fetch_accessions --dry_run false --submission_config conf/submission_config.yaml` |
| 78 | + |
| 79 | +Submit an updated biosample submission (open the updated Excel file from results/mpxv_test_metadata/final_submission_outputs/mpxv_test_metadata_updated.xlsx and add some fake SAMN IDs first): |
| 80 | +`nextflow run main.nf -profile test,singularity --workflow update_submission --dry_run false --species mpxv --submission_config conf/submission_config.yaml --batch_size 5 --original_submission_outdir results/mpxv_test_metadata/submission_outputs --meta_path results/mpxv_test_metadata/final_submission_outputs/mpxv_test_metadata_updated.xlsx` |
| 81 | +**Remember** This won’t run without those fake SAMN IDs in the biosample_accession field. |
| 82 | + |
| 83 | +Now we'll run a test GenBank submission using the test bacteria data included in the repository. |
| 84 | + |
| 85 | +Submit to BioSample first (because GenBank requires a BioSample accession): |
| 86 | +`nextflow run main.nf -profile test,singularity,bacteria --workflow biosample_and_sra --dry_run false --submission_config conf/submission_config.yaml` |
| 87 | + |
| 88 | +Open the updated Excel file from results/bacteria_test_metadata_1/final_submission_outputs/bacteria_test_metadata_1_updated.xlsx and add some fake SAMN IDs first. |
| 89 | +**The next command won't run without the fake SAMN IDs in biosample_accession column**. |
| 90 | +`nextflow run main.nf -profile test,singularity,bacteria --workflow genbank --dry_run false --submission_config conf/submission_config.yaml --annotation --download_bakta_db --bakta_db_light` |
| 91 | + |
83 | 92 | ## Submission config fields |
84 | 93 |
|
85 | 94 | The fields and corresponding example values can be found here: [Submission Config](https://github.com/CDCgov/tostadas/raw/master/conf/submission_config.yaml). |
|
0 commit comments