Improve ASR models' invariance to padding/batch size by pzelasko · Pull Request #13827 · NVIDIA-NeMo/NeMo

pzelasko · 2025-06-04T15:37:04Z

What does this PR do ?

Adds tests and fixes inconsistency in ASR feature extractor and subsampling when processing the same input with and without padding. Specifically:

decreases feature extractor's sequence length tensor by 1 as the previously computed value included an extra padding frame in majority of the cases
removes a Dirac-delta-like spike due to lack of masking in preemphasis (only present when audio_length < audio.shape["time"])
replaces "reflect" padding with zero-padding for padding-length-invariance
replaces convolution with masked convolution in subsampling to discard frames outside of the sequence length

As a result, the models' WER outcomes vary much less with batch size, but the outcome is still not 100% identical across batch sizes. For example, for parakeet-tdt-0.6b-v2, parakeet-rnnt-1.1b, and canary-180m-flash the absolute difference between batch sizes 128 and 512 was 0.01% WER.

Comparison of all NVIDIA NeMo ASR models on Open ASR Leaderboard (offline only):

Model	Current WER	New WER	Relative diff
CTC
nvidia/parakeet-ctc-1.1b	7.4	7.39	-0.14%
nvidia/parakeet-ctc-0.6b	7.69	7.65	-0.52%
nvidia/stt_en_fastconformer_ctc_large	8.96	8.94	-0.22%
nvidia/stt_en_conformer_ctc_large	8.32	8.5	2.16%
nvidia/stt_en_conformer_ctc_small	11.16	11.16	0.00%
RNNT
nvidia/parakeet-tdt-0.6b-v2	6.05	6.06	0.17%
nvidia/parakeet-tdt-1.1b	7.01	6.92	-1.28%
nvidia/parakeet-rnnt-1.1b	7.12	7.04	-1.12%
nvidia/parakeet-rnnt-0.6b	7.5	7.42	-1.07%
nvidia/stt_en_fastconformer_transducer_large	9.06	8.57	-5.41%
stt_en_conformer_transducer_small	10.26	9.75	-4.97%
nvidia/parakeet-tdt_ctc-110m	7.49	7.49	0.00%
AED
nvidia/canary-1b-flash	6.35	6.31	-0.63%
nvidia/canary-180m-flash	7.12	7.08	-0.56%
nvidia/canary-1b	6.5	6.47	-0.46%

I also checked the results on NVTalks for one cache-aware model:

Model	Current WER	New WER	Relative diff
stt_en_fastconformer_hybrid_large_streaming_1040ms	14.73	14.75	0.13%

Collection: ASR

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

tests/collections/asr/test_padding_and_batch_size_invariance.py

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

nithinraok

LGTM. Checked with parakeet models as well.

nithinraok · 2025-06-04T18:07:36Z

tests/collections/asr/test_padding_and_batch_size_invariance.py

+@pytest.mark.skip(reason="Used only for debugging.")
+@pytest.mark.parametrize("length", [16000])
+def test_canary_invariant_to_padding(deterministic_rng, length):
+    model = ASRModel.from_pretrained("nvidia/canary-180m-flash").eval()


no pretrained :)

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

…hub.com/nvidia/nemo into fix-pad-inconsistency-feature-extractor

nemo/collections/asr/parts/submodules/conformer_modules.py

nemo/collections/asr/parts/submodules/subsampling.py

tests/collections/asr/test_padding_and_batch_size_invariance.py

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

Signed-off-by: tango4j <tango4j@users.noreply.github.com>

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

tango4j · 2025-06-27T18:40:04Z

Just commenting for future reference.
To make the code future-proof, For speaker diarization (Sortformer), I imported the featurizer's parameters and then use the same formula to calculate the total feature frame count.

For Sortformer, Lhotse-based inference is supported but training is not supported yet.
Will update this with the Streaming Sortformer updates (feature frame calculation etc)

Signed-off-by: taejinp <tango4j@gmail.com>

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

github-actions · 2025-07-01T20:49:41Z

[🤖]: Hi @pzelasko 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

//cc @chtruong814 @ko3n1g @pablo-garay @thomasdhc

tango4j · 2025-07-01T21:11:20Z

@pzelasko I checked the diarization unit tests. As long as it passes all unit tests and CI test, I think the change makes no issues on Sortformer diarization.

nithinraok

Great work!

* Fix feature extractor to be invariant to padding Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix ci Signed-off-by: Piotr Żelasko <petezor@gmail.com> * preliminary conformer inference parity with/without padding Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix tests Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fixes Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix CI check Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix to cache-aware models Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix a bunch of tests Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Fix failing CI tests Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Fix failing CI tests part 2 Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Unit test fixes for too short feature extractor inputs Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Resolved feature frame length issue in E2E diarization dataloader Signed-off-by: taejinp <tango4j@gmail.com> * Apply isort and black reformatting Signed-off-by: tango4j <tango4j@users.noreply.github.com> * fix ci Signed-off-by: Piotr Żelasko <petezor@gmail.com> * removed test_ds from YAML file since it is not used Signed-off-by: taejinp <tango4j@gmail.com> * fix diarization unit tests after recent changes Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: taejinp <tango4j@gmail.com> Signed-off-by: tango4j <tango4j@users.noreply.github.com> Co-authored-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: taejinp <tango4j@gmail.com> Co-authored-by: tango4j <tango4j@users.noreply.github.com> Signed-off-by: Amir Hussein <amhussein@nvidia.com>

* Fix feature extractor to be invariant to padding Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix ci Signed-off-by: Piotr Żelasko <petezor@gmail.com> * preliminary conformer inference parity with/without padding Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix tests Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fixes Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix CI check Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix to cache-aware models Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix a bunch of tests Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Fix failing CI tests Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Fix failing CI tests part 2 Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Unit test fixes for too short feature extractor inputs Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Resolved feature frame length issue in E2E diarization dataloader Signed-off-by: taejinp <tango4j@gmail.com> * Apply isort and black reformatting Signed-off-by: tango4j <tango4j@users.noreply.github.com> * fix ci Signed-off-by: Piotr Żelasko <petezor@gmail.com> * removed test_ds from YAML file since it is not used Signed-off-by: taejinp <tango4j@gmail.com> * fix diarization unit tests after recent changes Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: taejinp <tango4j@gmail.com> Signed-off-by: tango4j <tango4j@users.noreply.github.com> Co-authored-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: taejinp <tango4j@gmail.com> Co-authored-by: tango4j <tango4j@users.noreply.github.com>

Fix feature extractor to be invariant to padding

517a297

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

pzelasko requested review from VahidooX, nithinraok and titu1994 June 4, 2025 15:37

github-actions bot added the ASR label Jun 4, 2025

github-advanced-security bot found potential problems Jun 4, 2025

View reviewed changes

tests/collections/asr/test_padding_and_batch_size_invariance.py Fixed Show fixed Hide fixed

pzelasko added the Run CICD label Jun 4, 2025

Merge branch 'main' into fix-pad-inconsistency-feature-extractor

d27b873

ko3n1g added Run CICD and removed Run CICD labels Jun 4, 2025

ko3n1g had a problem deploying to test June 4, 2025 16:22 — with GitHub Actions Error

fix ci

8059734

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

pzelasko added Run CICD and removed Run CICD labels Jun 4, 2025

ko3n1g added Run CICD and removed Run CICD labels Jun 4, 2025

ko3n1g temporarily deployed to test June 4, 2025 16:48 — with GitHub Actions Inactive

nithinraok previously approved these changes Jun 4, 2025

View reviewed changes

github-actions bot removed the Run CICD label Jun 4, 2025

Merge branch 'main' into fix-pad-inconsistency-feature-extractor

c8e467a

chtruong814 added the Run CICD label Jun 4, 2025

ko3n1g added Run CICD and removed Run CICD labels Jun 4, 2025

ko3n1g temporarily deployed to test June 4, 2025 23:48 — with GitHub Actions Inactive

github-actions bot removed the Run CICD label Jun 5, 2025

pzelasko added 2 commits June 5, 2025 19:29

preliminary conformer inference parity with/without padding

5cd3bd7

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

Merge branch 'fix-pad-inconsistency-feature-extractor' of https://git…

c314f43

…hub.com/nvidia/nemo into fix-pad-inconsistency-feature-extractor

pzelasko dismissed nithinraok’s stale review via c314f43 June 5, 2025 23:29

github-advanced-security bot found potential problems Jun 5, 2025

View reviewed changes

fix

5cc82c9

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

github-actions bot added the Speaker Tasks label Jun 27, 2025

ko3n1g added Run CICD and removed Run CICD labels Jun 27, 2025

Apply isort and black reformatting

9b417e2

Signed-off-by: tango4j <tango4j@users.noreply.github.com>

ko3n1g added Run CICD and removed Run CICD labels Jun 27, 2025

github-actions bot removed the Run CICD label Jun 27, 2025

fix ci

9f3ac5c

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

pzelasko added the Run CICD label Jun 27, 2025

pzelasko had a problem deploying to test June 27, 2025 17:59 — with GitHub Actions Error

removed test_ds from YAML file since it is not used

211eea8

Signed-off-by: taejinp <tango4j@gmail.com>

ko3n1g added Run CICD and removed Run CICD labels Jun 27, 2025

ko3n1g temporarily deployed to test June 27, 2025 18:55 — with GitHub Actions Inactive

github-actions bot removed the Run CICD label Jun 28, 2025

fix diarization unit tests after recent changes

50d4fa7

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

pzelasko added the Run CICD label Jul 1, 2025

pzelasko temporarily deployed to test July 1, 2025 13:57 — with GitHub Actions Inactive

pzelasko enabled auto-merge (squash) July 1, 2025 19:43

github-actions bot removed the Run CICD label Jul 1, 2025

nithinraok approved these changes Jul 2, 2025

View reviewed changes

pzelasko merged commit 0fd4de5 into main Jul 2, 2025
248 checks passed

pzelasko deleted the fix-pad-inconsistency-feature-extractor branch July 2, 2025 12:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve ASR models' invariance to padding/batch size#13827

Improve ASR models' invariance to padding/batch size#13827
pzelasko merged 22 commits intomainfrom
fix-pad-inconsistency-feature-extractor

pzelasko commented Jun 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

nithinraok left a comment

Uh oh!

nithinraok Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tango4j commented Jun 27, 2025

Uh oh!

github-actions bot commented Jul 1, 2025

Uh oh!

tango4j commented Jul 1, 2025

Uh oh!

nithinraok left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

pzelasko commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

Uh oh!

nithinraok left a comment

Choose a reason for hiding this comment

Uh oh!

nithinraok Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tango4j commented Jun 27, 2025

Uh oh!

github-actions bot commented Jul 1, 2025

Uh oh!

tango4j commented Jul 1, 2025

Uh oh!

nithinraok left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pzelasko commented Jun 4, 2025 •

edited

Loading