Skip to content

Dev -> master for nf-core/rnaseq 3.25.0#1816

Merged
pinin4fjords merged 180 commits into
masterfrom
dev
Apr 24, 2026
Merged

Dev -> master for nf-core/rnaseq 3.25.0#1816
pinin4fjords merged 180 commits into
masterfrom
dev

Conversation

@pinin4fjords

@pinin4fjords pinin4fjords commented Apr 23, 2026

Copy link
Copy Markdown
Member

Release 3.25.0

176 commits, 213 files changed since 3.24.0.

This release restructures --stringtie_ignore_gtf as a proper three-stage de-novo workflow, adds GPU acceleration for ribodetector, ships a new MultiQC strandedness-checks section, updates the default SortMeRNA database to SILVA 138, fixes per-sample MultiQC stalls under --skip_quantification_merge, and several other fixes and refactors.

Review guide

Each component PR was individually reviewed before merging to dev.

New features

PR Summary
#1755 Restructure --stringtie_ignore_gtf into a three-stage assemble → merge → quantify workflow via the nf-core bam_stringtie_merge subworkflow, publishing stringtie_merge.gtf and per-sample <sample>.denovo.transcripts.gtf
#1790 Add --use_gpu_ribodetector for GPU-accelerated rRNA removal; generalize GPU CI skip from SKIP_PARABRICKS to SKIP_GPU
#1792 Always emit a strand-agnostic <sample>.bigWig. Breaking: per-strand bigWigs are no longer emitted for unstranded libraries, where a -strand +/- split carries no biological meaning (#1275)
#1805 New MultiQC "Strandedness checks" section whose table rows reflect which strandedness analyses actually ran for each sample; narrow the prokaryotic RSeQC skip to prokaryote-unsafe modules only

Default changes

PR Summary
#1804 Skip StringTie by default in the prokaryotic profile, where reference-guided transcript assembly is not informative for bacterial/archaeal annotations
#1806 Raise Bowtie2 default -k from 1 to 200 for --aligner bowtie2_salmon so Salmon's EM has enough multi-mapping evidence to quantify small transcriptomes correctly
#1811 Update the default SortMeRNA rRNA database to smr_v4.3_default_db (SILVA 138) (#1354)

Bug fixes

PR Summary
#1781 Bump version to 3.25.0dev; fix SortMeRNA %rRNA appearing only under "Read 2" in MultiQC General Stats by using log filename for sample names instead of parsing paired-end read paths
#1793 Scope MultiQC's table_sample_merge config to samplesheet paired-end IDs so samples with _1/_2 suffixes (e.g. foo_1, foo_2) are no longer collapsed into a single foo row in General Statistics; factor MultiQC wiring into a new local MULTIQC_RNASEQ subworkflow
#1795 Bump custom/multiqccustombiotype to fail loudly when featureCounts output exceeds --max_biotypes (default 100), catching misconfigured --featurecounts_group_type values that previously hung MultiQC (#424)
#1803 Fix per-sample MultiQC hanging under --skip_quantification_merge by building the MultiQC input as a per-sample bundle, so each sample's report fires as soon as its own contributors arrive (#1797)
#1815 Gate the nf-test cleanup directive on $CI so pipeline-test work directories are retained on local reruns and only pruned in CI (#1813)
#1817 Derive the gene BED via gffread --bed on the prokaryotic path so RSeQC infer_experiment gets a usable reference; ea-utils/gtf2bed only emits BED rows for exon features and produced an empty BED from CDS-only prokaryotic annotations
#1819 Drop RSeQC infer_experiment for aligner == 'bowtie2_salmon' (transcriptome-aligned BAMs can't be inferred against a genomic BED); widen SKIP_SENTIEON to also skip on the conda profile since its license-server-driven output drifts across conda solves

Refactors and maintenance

PR Summary
#1784 Replace local gtf2bed module with nf-core ea-utils/gtf2bed module
#1786 Replace local bam_post_alignment_qc subworkflow and multiqc_custom_biotype module with nf-core bam_qc_rnaseq subworkflow and custom/multiqccustombiotype module; update dupradar to topic-based version reporting
#1788 Centralize pipeline-specific module configs in conf/modules/ following the nf-core/sarek pattern
#1814 Sync nf-core components to the latest versions, migrate the remaining local deseq2_qc module to topic-based version reporting, and retire ch_versions plumbing now that all modules emit versions via topic
#1818 Drop redundant versions.yml clauses from saveAs closures on processes that emit versions via the topic; replace implicit it with explicit parameter names in local workflow / subworkflow closures; bump nf-test 0.9.30.9.5 so the latest-everything CI matrix parses cleanly on Nextflow 26.03.x-edge

Documentation and tests

PR Summary
#1796 Clarify prokaryotic profile docs: transcripts are extracted from all transcript-like features (CDS, tRNA, rRNA, tmRNA, ncRNA, etc.), not only CDS
#1799 Bump version to 3.25.0 ahead of release
#1812 Dedupe redundant pipeline-level nf-test cases (fold min_mapped_reads into skip_qc; prune duplicate pseudo-alignment cases) without losing coverage

New parameters

Parameter Description
--use_gpu_ribodetector Enable GPU acceleration for ribodetector rRNA removal

Test plan

  • CI nf-test passes
  • Linting passes

🤖 Generated with Claude Code

Justin Payeur and others added 30 commits April 3, 2026 09:51
The SortMeRNA module in MultiQC extracts sample names from "Reads file:"
lines in the log content. For paired-end data, SortMeRNA logs both input
filenames and the parser picks the second (R2), giving a spurious _2
suffix. This causes table_sample_merge to place %rRNA under "Read 2"
only instead of the parent sample row.

Fix by adding `use_filename_as_sample_name: ["sortmerna"]` to the
MultiQC config so it uses the log filename (which has no read suffix)
and adding `.sortmerna` to `extra_fn_clean_exts` for proper cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bump version to 3.25.0dev; fix SortMeRNA sample name in MultiQC
Replace the local GTF_FILTER module and bin/filter_gtf.py script with
the nf-core shared custom/gtffilter module. The nf-core module:

- Produces byte-identical filtering output to the old local module
- Supports ext.args (enabling --skip_transcript_id_check via config)
- Handles optional FASTA input (transcript_id-only filtering when no
  genome FASTA is available, matching old filter_gtf.py behavior)
- Uses topic-based version reporting
- Supports gzipped GTF/FASTA input

The old ext.args config for GTF_FILTER set --skip_transcript_id_check
but was dead code (the old module never consumed task.ext.args). It has
been removed to preserve identical filtering behavior. It can be
re-enabled now that the new module properly consumes ext.args.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All pipeline and subworkflow test snapshots updated with:
- Version entry renamed from GTF_FILTER to CUSTOM_GTFFILTER (correct
  alphabetical position in the sorted version map)
- PREPARE_GENOME stub test: output filename corrected from
  genome.filtered.gtf to genes_with_empty_tid.filtered.gtf (the old
  stub incorrectly used fasta.baseName instead of gtf.baseName)
- No changes to pipeline output files, md5 hashes, or file counts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The old GTF_FILTER config set ext.args to '--skip_transcript_id_check'
by default, but the old module never consumed task.ext.args, so the
param had no effect. The actual behavior was: transcript_id filtering
was always ON.

The old config also had inverted logic (Elvis operator passed the flag
when the param was false, not true).

Now that CUSTOM_GTFFILTER consumes ext.args, wire up the param with
correct logic:
- skip_gtf_transcript_filter = false (default): no args, transcript_id
  filtering is ON (preserves current behavior)
- skip_gtf_transcript_filter = true: passes --skip_transcript_id_check

No snapshot changes needed since the default behavior is unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a subworkflow-level test that sets skip_gtf_transcript_filter=true
via a test-specific config. Verifies the filtered GTF contains more
lines than the default (the test GTF has 1 line with empty
transcript_id that is preserved when the check is skipped), confirming
the ext.args plumbing works end-to-end.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
refactor: delocalise GTF_FILTER to nf-core custom/gtffilter
… support

Reinstall the module from nf-core/modules master (f99b33c) which now
includes the changes from nf-core/modules#11155:
- ext.args support via argparse (--skip_transcript_id_check)
- Optional FASTA input handling
- New tests for both features

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix: wire up skip_gtf_transcript_filter to CUSTOM_GTFFILTER ext.args
Update all withName selectors, call sites, and snapshot references
from GTF2BED to EAUTILS_GTF2BED.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace local gtf2bed with nf-core ea-utils/gtf2bed module
Replace the local bam_post_alignment_qc subworkflow and local
multiqc_custom_biotype module with the nf-core/modules bam_qc_rnaseq
subworkflow and custom/multiqccustombiotype module.

- Install bam_qc_rnaseq subworkflow and custom/multiqccustombiotype
  module via nf-core tools
- Apply unicode escape fix to multiqccustombiotype template
  (nf-core/modules#11165)
- Update dupradar to latest (adds topic-based version reporting)
- Add defineQcTools() utility function to build the tools list from
  pipeline skip params, called from the top-level workflow
- Update ARM container for CUSTOM_MULTIQCCUSTOMBIOTYPE to python 3.12
- Rename version entries in pipeline test snapshots
  (MULTIQC_CUSTOM_BIOTYPE -> CUSTOM_MULTIQCCUSTOMBIOTYPE)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move pipeline-specific process configuration from scattered
nextflow.config files in module/subworkflow directories to a
central conf/modules/ directory, following the pattern used
by nf-core/sarek. This makes it clear which configs belong to
the pipeline vs. upstream nf-core/modules.

The 3 upstream nf-core subworkflow configs (quantify_rsem,
quantify_pseudo_alignment, quant_tximport_summarizedexperiment)
remain in their original locations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Point modules.json to the merged nf-core/modules#11165 commit which
includes the unicode escape fix, so the local template now matches
upstream exactly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Module/subworkflow configs must load before the pipeline-level
override configs, and the override configs must appear in their
original order. Nextflow config precedence means later definitions
win when multiple withName selectors match, so reordering changes
behavior (e.g. ext.prefix, publishDir).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Apr 23, 2026

Copy link
Copy Markdown

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 747d5e9

+| ✅ 204 tests passed       |+
#| ❔  12 tests were ignored |#
!| ❗   8 tests had warnings |!
Details

❗ Test warnings:

  • files_exist - File not found: assets/multiqc_config.yml
  • pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
  • pipeline_todos - TODO string in nextflow.config: Specify any additional parameters here
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 3.5.2
  • Run at 2026-04-24 14:39:48

Comment thread subworkflows/local/align_star/tests/main.sentieon.nf.test
ea-utils/gtf2bed only emits BED records for GTF rows with feature-type
exon, and prokaryotic annotations describe genes as CDS rather than
exon features. With #1805 now running RSeQC infer_experiment on the
prokaryotic profile, the empty BED fed to RSeQC silently produced
"Unknown Data type" for every sample, which propagated into the new
MultiQC Strandedness checks section as all-zero read-composition rows;
the bargraph then failed to render and (under rich >= 14.3) tripped the
MultiQC module-broke handler's rich.panel bug, turning a quiet module
degradation into a hard CI failure on the conda profile.

Switch the prokaryotic path to run the gffread --bed output emit added
in nf-core/modules#11298, keeping the eukaryotic EAUTILS_GTF2BED path
unchanged. The resulting BED derives intervals from CDS / transcript
features, so RSeQC now samples real reads (~8k per SALM_REP*) and the
Strandedness checks section renders with meaningful fractions and a
proper pass/fail status.

- Add GFFREAD_GENE_BED alias (gffread --bed) in PREPARE_GENOME
- Branch gene BED generation on params.prokaryotic
- Pass params.prokaryotic as a new PREPARE_GENOME subworkflow input
- Update prepare_genome subworkflow tests to pass prokaryotic=false
- Regenerate tests/prokaryotic.nf.test.snap to reflect the
  now-renderable strandedness section and the GFFREAD_GENE_BED
  software-versions entry

Note: this commit ships the nf-core/gffread module in its post-#11298
state; modules.json is intentionally unchanged and will be bumped once
that module PR merges.

@jonasscheid jonasscheid left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR! I only have non-blocking questions

Comment thread conf/modules/align_bowtie2.config
Comment thread conf/modules/prepare_genome.config
Comment thread subworkflows/local/align_star/tests/main.extra_args.nf.test
Comment thread subworkflows/local/multiqc_rnaseq/tests/main.function.nf.test
Comment thread subworkflows/local/utils_nfcore_rnaseq_pipeline/main.nf
Comment thread workflows/rnaseq/main.nf Outdated
Modules that now emit their version via the `versions` topic no longer
produce a `versions.yml` output file, so the `saveAs` closures that
filtered that filename are a no-op. Drop the `versions.yml` clause
from the closures (or drop the closure entirely where the filter was
its only purpose) for every affected process.

Modules that still emit a `versions.yml` path keep their filter:
CUSTOM_MULTIQCCUSTOMBIOTYPE, EAUTILS_GTF2BED, CUSTOM_CATADDITIONALFASTA,
CUSTOM_GTFFILTER, CUSTOM_TX2GENE, TXIMETA_TXIMPORT, SE_* (from
summarizedexperiment).
pinin4fjords and others added 8 commits April 24, 2026 12:31
Nextflow strict syntax flags implicit `it` in single-arg closures.
Rename to named parameters in local workflow / subworkflow files.
Scope is intentionally limited to local files; the same pattern in
subworkflows/nf-core/* is upstream-owned and should be fixed in
nf-core/modules.
Co-authored-by: Maxime U Garcia <max.u.garcia@gmail.com>
fix(prokaryotic): derive gene BED via gffread --bed
Unblocks the latest-everything matrix on Nextflow 26.03.x-edge, whose
new parser rejects the Groovy spread operator used in nf-test 0.9.3's
generated harness (`${process}(*input)`). nf-test 0.9.4+ switched the
template to `${process}.run(input.toArray())`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
chore: drop redundant versions.yml filters from saveAs closures
bowtie2_salmon aligns directly to the transcriptome, so the genome
BAM it feeds into BAM_QC_RNASEQ carries transcript IDs rather than
chromosomes. RSeQC infer_experiment samples reads against a genomic
BED, finds 0 overlap, and reports Unknown Data type - the all-zero
fractions then trip MultiQC bargraph in the strand-check section
(same symptom #1817 fixed for star_salmon, different root cause).
Salmon lib_format_counts inference is the correct strand signal on
transcriptome alignments and is already surfaced in the strand-check
section, so drop infer_experiment from the RSeQC module list when
the aligner is bowtie2_salmon.
sentieon under conda talks to a license server and resolves its deps
fresh each run. Small alignment-level drift ripples through
classifyStrand threshold comparisons into a summary-table md5 drift
vs the docker-captured snapshot, so the sentieon conda shard flakes.
Docker keeps sentieon coverage with a pinned container.
…fer-experiment

fix(prokaryotic): skip RSeQC infer_experiment on bowtie2_salmon + skip sentieon on conda
@pinin4fjords pinin4fjords merged commit 891468c into master Apr 24, 2026
292 of 295 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants