Dev -> master for nf-core/rnaseq 3.25.0#1816
Merged
Merged
Conversation
The SortMeRNA module in MultiQC extracts sample names from "Reads file:" lines in the log content. For paired-end data, SortMeRNA logs both input filenames and the parser picks the second (R2), giving a spurious _2 suffix. This causes table_sample_merge to place %rRNA under "Read 2" only instead of the parent sample row. Fix by adding `use_filename_as_sample_name: ["sortmerna"]` to the MultiQC config so it uses the log filename (which has no read suffix) and adding `.sortmerna` to `extra_fn_clean_exts` for proper cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bump version to 3.25.0dev; fix SortMeRNA sample name in MultiQC
Replace the local GTF_FILTER module and bin/filter_gtf.py script with the nf-core shared custom/gtffilter module. The nf-core module: - Produces byte-identical filtering output to the old local module - Supports ext.args (enabling --skip_transcript_id_check via config) - Handles optional FASTA input (transcript_id-only filtering when no genome FASTA is available, matching old filter_gtf.py behavior) - Uses topic-based version reporting - Supports gzipped GTF/FASTA input The old ext.args config for GTF_FILTER set --skip_transcript_id_check but was dead code (the old module never consumed task.ext.args). It has been removed to preserve identical filtering behavior. It can be re-enabled now that the new module properly consumes ext.args. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All pipeline and subworkflow test snapshots updated with: - Version entry renamed from GTF_FILTER to CUSTOM_GTFFILTER (correct alphabetical position in the sorted version map) - PREPARE_GENOME stub test: output filename corrected from genome.filtered.gtf to genes_with_empty_tid.filtered.gtf (the old stub incorrectly used fasta.baseName instead of gtf.baseName) - No changes to pipeline output files, md5 hashes, or file counts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The old GTF_FILTER config set ext.args to '--skip_transcript_id_check' by default, but the old module never consumed task.ext.args, so the param had no effect. The actual behavior was: transcript_id filtering was always ON. The old config also had inverted logic (Elvis operator passed the flag when the param was false, not true). Now that CUSTOM_GTFFILTER consumes ext.args, wire up the param with correct logic: - skip_gtf_transcript_filter = false (default): no args, transcript_id filtering is ON (preserves current behavior) - skip_gtf_transcript_filter = true: passes --skip_transcript_id_check No snapshot changes needed since the default behavior is unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a subworkflow-level test that sets skip_gtf_transcript_filter=true via a test-specific config. Verifies the filtered GTF contains more lines than the default (the test GTF has 1 line with empty transcript_id that is preserved when the check is skipped), confirming the ext.args plumbing works end-to-end. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
refactor: delocalise GTF_FILTER to nf-core custom/gtffilter
… support Reinstall the module from nf-core/modules master (f99b33c) which now includes the changes from nf-core/modules#11155: - ext.args support via argparse (--skip_transcript_id_check) - Optional FASTA input handling - New tests for both features Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix: wire up skip_gtf_transcript_filter to CUSTOM_GTFFILTER ext.args
Update all withName selectors, call sites, and snapshot references from GTF2BED to EAUTILS_GTF2BED. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace local gtf2bed with nf-core ea-utils/gtf2bed module
Replace the local bam_post_alignment_qc subworkflow and local multiqc_custom_biotype module with the nf-core/modules bam_qc_rnaseq subworkflow and custom/multiqccustombiotype module. - Install bam_qc_rnaseq subworkflow and custom/multiqccustombiotype module via nf-core tools - Apply unicode escape fix to multiqccustombiotype template (nf-core/modules#11165) - Update dupradar to latest (adds topic-based version reporting) - Add defineQcTools() utility function to build the tools list from pipeline skip params, called from the top-level workflow - Update ARM container for CUSTOM_MULTIQCCUSTOMBIOTYPE to python 3.12 - Rename version entries in pipeline test snapshots (MULTIQC_CUSTOM_BIOTYPE -> CUSTOM_MULTIQCCUSTOMBIOTYPE) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move pipeline-specific process configuration from scattered nextflow.config files in module/subworkflow directories to a central conf/modules/ directory, following the pattern used by nf-core/sarek. This makes it clear which configs belong to the pipeline vs. upstream nf-core/modules. The 3 upstream nf-core subworkflow configs (quantify_rsem, quantify_pseudo_alignment, quant_tximport_summarizedexperiment) remain in their original locations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Point modules.json to the merged nf-core/modules#11165 commit which includes the unicode escape fix, so the local template now matches upstream exactly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Module/subworkflow configs must load before the pipeline-level override configs, and the override configs must appear in their original order. Nextflow config precedence means later definitions win when multiple withName selectors match, so reordering changes behavior (e.g. ext.prefix, publishDir). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…th shipped changes
Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com>
Bump version to 3.25.0 ahead of release
|
FriederikeHanssen
approved these changes
Apr 24, 2026
ea-utils/gtf2bed only emits BED records for GTF rows with feature-type exon, and prokaryotic annotations describe genes as CDS rather than exon features. With #1805 now running RSeQC infer_experiment on the prokaryotic profile, the empty BED fed to RSeQC silently produced "Unknown Data type" for every sample, which propagated into the new MultiQC Strandedness checks section as all-zero read-composition rows; the bargraph then failed to render and (under rich >= 14.3) tripped the MultiQC module-broke handler's rich.panel bug, turning a quiet module degradation into a hard CI failure on the conda profile. Switch the prokaryotic path to run the gffread --bed output emit added in nf-core/modules#11298, keeping the eukaryotic EAUTILS_GTF2BED path unchanged. The resulting BED derives intervals from CDS / transcript features, so RSeQC now samples real reads (~8k per SALM_REP*) and the Strandedness checks section renders with meaningful fractions and a proper pass/fail status. - Add GFFREAD_GENE_BED alias (gffread --bed) in PREPARE_GENOME - Branch gene BED generation on params.prokaryotic - Pass params.prokaryotic as a new PREPARE_GENOME subworkflow input - Update prepare_genome subworkflow tests to pass prokaryotic=false - Regenerate tests/prokaryotic.nf.test.snap to reflect the now-renderable strandedness section and the GFFREAD_GENE_BED software-versions entry Note: this commit ships the nf-core/gffread module in its post-#11298 state; modules.json is intentionally unchanged and will be bumped once that module PR merges.
3 tasks
jonasscheid
approved these changes
Apr 24, 2026
jonasscheid
left a comment
There was a problem hiding this comment.
Nice PR! I only have non-blocking questions
Modules that now emit their version via the `versions` topic no longer produce a `versions.yml` output file, so the `saveAs` closures that filtered that filename are a no-op. Drop the `versions.yml` clause from the closures (or drop the closure entirely where the filter was its only purpose) for every affected process. Modules that still emit a `versions.yml` path keep their filter: CUSTOM_MULTIQCCUSTOMBIOTYPE, EAUTILS_GTF2BED, CUSTOM_CATADDITIONALFASTA, CUSTOM_GTFFILTER, CUSTOM_TX2GENE, TXIMETA_TXIMPORT, SE_* (from summarizedexperiment).
Nextflow strict syntax flags implicit `it` in single-arg closures. Rename to named parameters in local workflow / subworkflow files. Scope is intentionally limited to local files; the same pattern in subworkflows/nf-core/* is upstream-owned and should be fixed in nf-core/modules.
Co-authored-by: Maxime U Garcia <max.u.garcia@gmail.com>
fix(prokaryotic): derive gene BED via gffread --bed
Unblocks the latest-everything matrix on Nextflow 26.03.x-edge, whose
new parser rejects the Groovy spread operator used in nf-test 0.9.3's
generated harness (`${process}(*input)`). nf-test 0.9.4+ switched the
template to `${process}.run(input.toArray())`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts: # CHANGELOG.md
chore: drop redundant versions.yml filters from saveAs closures
bowtie2_salmon aligns directly to the transcriptome, so the genome BAM it feeds into BAM_QC_RNASEQ carries transcript IDs rather than chromosomes. RSeQC infer_experiment samples reads against a genomic BED, finds 0 overlap, and reports Unknown Data type - the all-zero fractions then trip MultiQC bargraph in the strand-check section (same symptom #1817 fixed for star_salmon, different root cause). Salmon lib_format_counts inference is the correct strand signal on transcriptome alignments and is already surfaced in the strand-check section, so drop infer_experiment from the RSeQC module list when the aligner is bowtie2_salmon.
Merged
1 task
sentieon under conda talks to a license server and resolves its deps fresh each run. Small alignment-level drift ripples through classifyStrand threshold comparisons into a summary-table md5 drift vs the docker-captured snapshot, so the sentieon conda shard flakes. Docker keeps sentieon coverage with a pinned container.
…fer-experiment fix(prokaryotic): skip RSeQC infer_experiment on bowtie2_salmon + skip sentieon on conda
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Release 3.25.0
176 commits, 213 files changed since 3.24.0.
This release restructures
--stringtie_ignore_gtfas a proper three-stage de-novo workflow, adds GPU acceleration for ribodetector, ships a new MultiQC strandedness-checks section, updates the default SortMeRNA database to SILVA 138, fixes per-sample MultiQC stalls under--skip_quantification_merge, and several other fixes and refactors.Review guide
Each component PR was individually reviewed before merging to dev.
New features
--stringtie_ignore_gtfinto a three-stage assemble → merge → quantify workflow via the nf-corebam_stringtie_mergesubworkflow, publishingstringtie_merge.gtfand per-sample<sample>.denovo.transcripts.gtf--use_gpu_ribodetectorfor GPU-accelerated rRNA removal; generalize GPU CI skip fromSKIP_PARABRICKStoSKIP_GPU<sample>.bigWig. Breaking: per-strand bigWigs are no longer emitted for unstranded libraries, where a-strand +/-split carries no biological meaning (#1275)Default changes
prokaryoticprofile, where reference-guided transcript assembly is not informative for bacterial/archaeal annotations-kfrom 1 to 200 for--aligner bowtie2_salmonso Salmon's EM has enough multi-mapping evidence to quantify small transcriptomes correctlysmr_v4.3_default_db(SILVA 138) (#1354)Bug fixes
%rRNAappearing only under "Read 2" in MultiQC General Stats by using log filename for sample names instead of parsing paired-end read pathstable_sample_mergeconfig to samplesheet paired-end IDs so samples with_1/_2suffixes (e.g.foo_1,foo_2) are no longer collapsed into a singlefoorow in General Statistics; factor MultiQC wiring into a new localMULTIQC_RNASEQsubworkflowcustom/multiqccustombiotypeto fail loudly when featureCounts output exceeds--max_biotypes(default 100), catching misconfigured--featurecounts_group_typevalues that previously hung MultiQC (#424)--skip_quantification_mergeby building the MultiQC input as a per-sample bundle, so each sample's report fires as soon as its own contributors arrive (#1797)cleanupdirective on$CIso pipeline-test work directories are retained on local reruns and only pruned in CI (#1813)gffread --bedon the prokaryotic path so RSeQCinfer_experimentgets a usable reference;ea-utils/gtf2bedonly emits BED rows forexonfeatures and produced an empty BED from CDS-only prokaryotic annotationsinfer_experimentforaligner == 'bowtie2_salmon'(transcriptome-aligned BAMs can't be inferred against a genomic BED); widenSKIP_SENTIEONto also skip on the conda profile since its license-server-driven output drifts across conda solvesRefactors and maintenance
gtf2bedmodule with nf-coreea-utils/gtf2bedmodulebam_post_alignment_qcsubworkflow andmultiqc_custom_biotypemodule with nf-corebam_qc_rnaseqsubworkflow andcustom/multiqccustombiotypemodule; updatedupradarto topic-based version reportingconf/modules/following the nf-core/sarek patterndeseq2_qcmodule to topic-based version reporting, and retirech_versionsplumbing now that all modules emit versions via topicversions.ymlclauses fromsaveAsclosures on processes that emit versions via the topic; replace implicititwith explicit parameter names in local workflow / subworkflow closures; bump nf-test0.9.3→0.9.5so thelatest-everythingCI matrix parses cleanly on Nextflow 26.03.x-edgeDocumentation and tests
min_mapped_readsintoskip_qc; prune duplicate pseudo-alignment cases) without losing coverageNew parameters
--use_gpu_ribodetectorTest plan
🤖 Generated with Claude Code