Removing the dependency on Pyannote for Diarization and VAD by tango4j · Pull Request #15632 · NVIDIA-NeMo/NeMo

tango4j · 2026-04-21T22:14:41Z

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Removes the pyannote.core and pyannote.metrics dependencies from NeMo's
speaker-diarization stack and replaces them with an in-tree, NIST
md-eval-22.pl-faithful Python engine plus lhotse.SupervisionSegment-based
annotation objects. The public API of nemo.collections.asr.metrics.der is
preserved, including byte-for-byte numerical parity with historical NeMo
diarization results (no shift in published DER numbers).

Tried to replace Pyannote classes with Lhotse's classes, to minimize the code
added to the repo by removing Pyannote imports. Except RTTM writing functions,
mostly replaceable.

Collection: ASR (speaker tasks / diarization, VAD)

Changelog

New: in-tree DER engine (nemo/collections/asr/metrics/md_eval.py)

New module: a Python port of NIST md-eval-22.pl, written in NeMo style
(Apache header, type hints, Google-style docstrings, __all__,
nemo.utils.logging, no CLI). Drives all DER computation.
New DiarizationErrorResult result object exposing the dict-like interface
used throughout NeMo (abs(result), result['total' | 'confusion' | 'false alarm' | 'missed detection'], result.results_,
result.optimal_mapping(...), result.report()).

nemo/collections/asr/metrics/der.py (DER public API)

score_labels, evaluate_der, score_labels_from_rttm_labels,
get_partial_ref_labels, get_online_DER_stats, calculate_session_cpWER,
calculate_session_cpWER_bruteforce, concat_perm_word_error_rate are
all preserved with their original names, signatures, and return shapes.
No breaking changes for downstream callers.
New lhotse-backed annotation helpers (replacements for the previous
pyannote.core types):
- make_diar_segment(start, end, speaker, ...) -> SupervisionSegment
- make_diar_annotation(labels, uniq_name=...) -> list[SupervisionSegment]
- make_uem_timeline(uem_lines, uniq_id=...) -> list[SupervisionSegment]
  (UEM regions carried as supervisions with speaker="UEM")
- unique_speakers(annotation) -> list[str]
- write_supervisions_to_rttm(annotation, file_handle, ...)
New score_labels_from_rttm_labels(...) convenience entry point that takes
raw "start end speaker" label strings (no annotation object construction
required by the caller).
New _default_uem_from_ref_sys(ref_data, sys_data) helper. When a caller
does not supply a UEM, the high-level wrappers now auto-derive
[min(ref ∪ sys TBEG), max(ref ∪ sys TEND)] per (file_id, channel) and
pass it to evaluate(). This matches the historical no-UEM scoring map
used by the previous external engine and prevents any over-shoot of the
hypothesis past the last reference segment from being silently dropped.
md_eval.evaluate() itself remains a faithful NIST port (ref-extent only)
for power users that call it directly.
Docstring on collar argument in both score_labels and
score_labels_from_rttm_labels clarifies the NIST half-width semantics
(total no-score zone = 2 * collar) and gives the cross-engine conversion
rule (NeMo collar=X <==> external libs that define collar as total width
collar=2X).

Source code rename / scrub (no behaviour change)

nemo/collections/asr/parts/utils/speaker_utils.py:
- labels_to_pyannote_object -> labels_to_supervisions
- timestamps_to_pyannote_object -> timestamps_to_supervisions
- now returns list[SupervisionSegment]
nemo/collections/asr/parts/utils/vad_utils.py:
- vad_construct_pyannote_object_per_file -> vad_construct_supervisions_per_file
- frame_vad_construct_pyannote_object_per_file -> frame_vad_construct_supervisions_per_file
- read_rttm_as_pyannote_object -> read_rttm_as_supervisions
- new internal _DetectionErrorRateAccumulator class replaces
  pyannote.metrics.detection.DetectionErrorRate, backed by md_eval. It
  preserves the metric(reference, hypothesis) accumulation +
  metric.report(display=False) API and returns a pandas DataFrame with
  the same ('detection error rate', '%'), ('false alarm', '%'),
  ('miss', '%') columns that downstream code consumes.
scripts/speaker_tasks/eval_diar_with_asr.py:
- get_pyannote_objs_from_rttms -> get_supervisions_from_rttms
examples/speaker_tasks/diarization/neural_diarizer/e2e_diarize_speech.py:
- call sites updated to the new timestamps_to_supervisions name
All docstrings, comments, and reference URLs that mentioned the third-party
package by name have been rewritten (or replaced with neutral wording such
as "External Annotation Library") so a git grep -i pyannote over the
branch returns zero matches.
Two tutorial notebooks (tutorials/speaker_tasks/End_to_End_Diarization_*.ipynb,
tutorials/tools/Multispeaker_Simulator.ipynb) and the inference notebook
updated to use the new names and score_labels_from_rttm_labels.

Dependencies removed

requirements/requirements_asr.txt: removed pyannote.core and pyannote.metrics.
examples/voice_agent/environment.yaml: removed pyannote-core==5.0.0,
pyannote-database==5.1.3, pyannote-metrics==3.2.1.
uv.lock: removed the three corresponding [[package]] blocks and every
transitive { name = "pyannote-..." } entry. TOML structure validated
after edit.

Tests

New tests/collections/speaker_tasks/utils/test_der.py (119 unit tests)
covering:
- md-eval engine: basic, collar, overlap, speaker count, UEM
- score_labels_from_rttm_labels (string-label public API)
- Multi-file aggregation
- 21 hardcoded values verified independently against the previous external
  engine implementation (class TestExternalEngineVerifiedValues)
- Lhotse-backed annotation pipeline end-to-end + bit-exact equivalence
  with the string-label path
- 7-test TestNoUemAutoUnion regression class pinning the auto-UEM
  behaviour and the NIST collar semantics with hand-derived expected
  values from the diarization tutorial sample
- Negative test asserting pyannote.core / pyannote.metrics submodules
  are never imported when der / md_eval are imported
tests/collections/{asr,speaker_tasks}/utils/test_vad_utils_*.py updated
to use lhotse-based assertions via a new _annotation_equals(annotation, expected_segments) helper.

Usage

The public API is unchanged, so existing user code continues to work. New
shorthand for users that already have RTTM-style label strings:

from nemo.collections.asr.metrics.der import score_labels_from_rttm_labels
from nemo.collections.asr.parts.utils.speaker_utils import rttm_to_labels
ref_labels = rttm_to_labels("ground_truth.rttm")
hyp_labels = rttm_to_labels("system.rttm")
der_metric, mapping, (DER, CER, FA, MISS) = score_labels_from_rttm_labels(
    ref_labels_list=[("session_001", ref_labels)],
    hyp_labels_list=[("session_001", hyp_labels)],
    collar=0.25,           # NIST half-width: total no-score zone = 0.50s
    ignore_overlap=False,
    verbose=False,
)
print(f"DER = {abs(der_metric):.4f}")
The lhotse-based path (drop-in for previous external-library annotations):

from nemo.collections.asr.metrics.der import score_labels, make_diar_annotation
ref = make_diar_annotation(ref_labels, uniq_name="session_001")
hyp = make_diar_annotation(hyp_labels, uniq_name="session_001")
metric, mapping, errs = score_labels(
    AUDIO_RTTM_MAP={"session_001": {}},
    all_reference=[("session_001", ref)],
    all_hypothesis=[("session_001", hyp)],
    collar=0.25,
    ignore_overlap=False,
    verbose=False,
)

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Removes a maintenance liability: the previous external diarization metric packages have been on pip with infrequent updates and have pulled in a large transitive closure (pyannote-database, pyannote-pipeline, ...). After this PR, NeMo's DER pipeline depends only on numpy, scipy, lhotse, and editdistance -- all already required.
Backward-compatibility audit: git grep -i pyannote over the branch returns zero matches across Python sources, notebooks, configs, lockfile, docs, and shell scripts. import nemo followed by inspecting sys.modules shows no pyannote.* entries.
Numerical-parity audit: 21 verified-against-the-previous-engine DER values hardcoded in TestExternalEngineVerifiedValues, plus 7 regression tests pinning the auto-UEM and collar semantics with hand-derived expected values from the diarization tutorial sample.

Signed-off-by: taejinp <tango4j@gmail.com>

copy-pr-bot · 2026-04-21T22:14:45Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: tango4j <tango4j@users.noreply.github.com>

Signed-off-by: taejinp <tango4j@gmail.com>

Signed-off-by: tango4j <tango4j@users.noreply.github.com>

tango4j · 2026-04-21T22:42:15Z

@pzelasko
Can you just scan uv.lock and requirements.txt to see if there is no issues?

Signed-off-by: taejinp <tango4j@gmail.com>

…_py_md_eval

Signed-off-by: taejinp <tango4j@gmail.com>

github-actions · 2026-04-22T13:55:30Z

[🤖]: Hi @tango4j 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

pzelasko · 2026-04-22T14:50:16Z

@@ -4726,8 +4726,6 @@ all = [
    { name = "peft" },


@tango4j Did you remove these manually or regenerate using uv lock? I'd have expected this file to change more, with more transitive dependencies being dropped.

I think at least there could be some version changes (pinned ones) for other dependencies. Let me run some specific checks to what dependencies are affected by this.

@pzelasko I double checked the other dependencies but unfortunately there are no dependencies to remove other than pyannote itself. Kind of disappointing. Thus, the new uv.lock generated showed 0 lines of diff from the current one.

I will wait until @ipmedenn to do the final function test. If @ipmedenn greenlights, maybe I can merge.

thanks. looks good from my side

stevehuang52

Looks good from my end

tango4j · 2026-04-22T15:59:59Z

Before

[NeMo I 2026-04-22 08:55:39 e2e_diarize_speech:444] 
            diarization error rate    total  correct            false alarm           missed detection           confusion          
                                 %                            %                     %                          %                   %
    item                                                                                                                            
    en_4074               1.732310   421.98   418.99  99.291436        4.32  1.023745             2.93  0.694346      0.06  0.014219
    en_0638               1.856287   250.50   247.30  98.722555        1.45  0.578842             3.20  1.277445      0.00  0.000000
    en_4065               8.047537   419.88   389.70  92.812232        3.61  0.859769            28.19  6.713823      1.99  0.473945
    TOTAL                 4.188180  1092.36  1055.99  96.670512        9.38  0.858691            34.32  3.141821      2.05  0.187667
[NeMo I 2026-04-22 08:55:39 e2e_diarize_speech:444] Cumulative Results for collar 0.25 sec and ignore_overlap False: 
    | FA: 0.0086 | MISS: 0.0314 | CER: 0.0019 | DER: 0.0419 | Spk. Count Acc. 0.6667
    
PostProcessingParams: {'onset': 0.5, 'offset': 0.5, 'pad_onset': 0.0, 'pad_offset': 0.0, 'min_duration_on': 0.0, 'min_duration_off': 0.0}

After remove Pyannote backend

[NeMo I 2026-04-22 08:54:41 e2e_diarize_speech:444] 
    file                                          total  confusion  false alarm     missed      DER
    -----------------------------------------------------------------------------------------------
    en_0638                                      250.50       0.00         1.45       3.20    1.86%
    en_4065                                      419.88       1.99         3.61      28.19    8.05%
    en_4074                                      421.98       0.06         4.32       2.93    1.73%
    -----------------------------------------------------------------------------------------------
    TOTAL                                       1092.36       2.05         9.38      34.32    4.19%
[NeMo I 2026-04-22 08:54:41 e2e_diarize_speech:444] Cumulative Results for collar 0.25 sec and ignore_overlap False: 
    | FA: 0.0086 | MISS: 0.0314 | CER: 0.0019 | DER: 0.0419 | Spk. Count Acc. 0.6667
    
PostProcessingParams: {'onset': 0.5, 'offset': 0.5, 'pad_onset': 0.0, 'pad_offset': 0.0, 'min_duration_on': 0.0, 'min_duration_off': 0.0}

Comparison of DER stats output.

tango4j · 2026-04-27T16:54:38Z

@ipmedenn is working on some score mismatch issues of this PR. We will merge it after we clear this up.

Removing the dependency on Pyannote

28e1e61

Signed-off-by: taejinp <tango4j@gmail.com>

github-actions Bot added ASR Speaker Tasks labels Apr 21, 2026

tango4j requested a review from pzelasko April 21, 2026 22:15

Apply isort and black reformatting

5edd254

Signed-off-by: tango4j <tango4j@users.noreply.github.com>

tango4j requested review from ipmedenn and stevehuang52 April 21, 2026 22:15

github-advanced-security AI found potential problems Apr 21, 2026

View reviewed changes

Comment thread nemo/collections/asr/metrics/der.py Fixed

tango4j and others added 3 commits April 21, 2026 15:33

Fixed linting and unused variables

194221d

Signed-off-by: taejinp <tango4j@gmail.com>

Resolved conflicts

4a4210a

Signed-off-by: taejinp <tango4j@gmail.com>

Apply isort and black reformatting

e43f8b3

Signed-off-by: tango4j <tango4j@users.noreply.github.com>

tango4j added Run CICD and removed Run CICD labels Apr 21, 2026

chtruong814 added the Run CICD label Apr 21, 2026

chtruong814 had a problem deploying to test April 21, 2026 22:41 — with GitHub Actions Error

tango4j added 2 commits April 21, 2026 15:46

Removing the accidently added image

58b8dd8

Signed-off-by: taejinp <tango4j@gmail.com>

Merge branch 'add_py_md_eval' of github.com:NVIDIA-NeMo/NeMo into add…

a5f1dc1

…_py_md_eval

chtruong814 removed the Run CICD label Apr 21, 2026

tango4j removed the Run CICD label Apr 21, 2026

chtruong814 added the Run CICD label Apr 21, 2026

chtruong814 had a problem deploying to test April 21, 2026 22:47 — with GitHub Actions Error

Removing the accidently an added tutorial notebook box

ee2dfea

Signed-off-by: taejinp <tango4j@gmail.com>

chtruong814 added Run CICD and removed Run CICD labels Apr 21, 2026

chtruong814 had a problem deploying to test April 21, 2026 22:50 — with GitHub Actions Error

Removing data folder for notebook cache

cf5d74c

Signed-off-by: taejinp <tango4j@gmail.com>

chtruong814 added Run CICD and removed Run CICD labels Apr 21, 2026

tango4j added Run CICD and removed Run CICD labels Apr 21, 2026

tango4j temporarily deployed to test April 21, 2026 22:55 — with GitHub Actions Inactive

github-actions Bot removed the Run CICD label Apr 22, 2026

pzelasko reviewed Apr 22, 2026

View reviewed changes

stevehuang52 approved these changes Apr 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removing the dependency on Pyannote for Diarization and VAD#15632

Removing the dependency on Pyannote for Diarization and VAD#15632
tango4j wants to merge 9 commits intomainfrom
add_py_md_eval

tango4j commented Apr 21, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Apr 21, 2026

Uh oh!

Uh oh!

tango4j commented Apr 21, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

pzelasko Apr 22, 2026

Uh oh!

tango4j Apr 22, 2026

Uh oh!

tango4j Apr 22, 2026

Uh oh!

pzelasko Apr 22, 2026

Uh oh!

stevehuang52 left a comment

Uh oh!

tango4j commented Apr 22, 2026

Uh oh!

tango4j commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

tango4j commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

copy-pr-bot Bot commented Apr 21, 2026

Uh oh!

Uh oh!

tango4j commented Apr 21, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

pzelasko Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

tango4j Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

tango4j Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

pzelasko Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

stevehuang52 left a comment

Choose a reason for hiding this comment

Uh oh!

tango4j commented Apr 22, 2026

Uh oh!

tango4j commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tango4j commented Apr 21, 2026 •

edited

Loading