Removing the dependency on Pyannote for Diarization and VAD#15632
Removing the dependency on Pyannote for Diarization and VAD#15632
Conversation
Signed-off-by: taejinp <tango4j@gmail.com>
Signed-off-by: tango4j <tango4j@users.noreply.github.com>
Signed-off-by: taejinp <tango4j@gmail.com>
Signed-off-by: taejinp <tango4j@gmail.com>
Signed-off-by: tango4j <tango4j@users.noreply.github.com>
|
@pzelasko |
Signed-off-by: taejinp <tango4j@gmail.com>
Signed-off-by: taejinp <tango4j@gmail.com>
Signed-off-by: taejinp <tango4j@gmail.com>
|
[🤖]: Hi @tango4j 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully. So it might be time to merge this PR or get some approvals. |
| @@ -4726,8 +4726,6 @@ all = [ | |||
| { name = "peft" }, | |||
There was a problem hiding this comment.
@tango4j Did you remove these manually or regenerate using uv lock? I'd have expected this file to change more, with more transitive dependencies being dropped.
There was a problem hiding this comment.
I think at least there could be some version changes (pinned ones) for other dependencies. Let me run some specific checks to what dependencies are affected by this.
There was a problem hiding this comment.
@pzelasko I double checked the other dependencies but unfortunately there are no dependencies to remove other than pyannote itself. Kind of disappointing. Thus, the new uv.lock generated showed 0 lines of diff from the current one.
I will wait until @ipmedenn to do the final function test. If @ipmedenn greenlights, maybe I can merge.
There was a problem hiding this comment.
thanks. looks good from my side
stevehuang52
left a comment
There was a problem hiding this comment.
Looks good from my end
|
Before After remove Pyannote backend Comparison of DER stats output. |
|
@ipmedenn is working on some score mismatch issues of this PR. We will merge it after we clear this up. |
Important
The
Update branchbutton must only be pressed in very rare occassions.An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.
What does this PR do ?
Removes the
pyannote.coreandpyannote.metricsdependencies from NeMo'sspeaker-diarization stack and replaces them with an in-tree, NIST
md-eval-22.pl-faithful Python engine pluslhotse.SupervisionSegment-basedannotation objects. The public API of
nemo.collections.asr.metrics.derispreserved, including byte-for-byte numerical parity with historical NeMo
diarization results (no shift in published DER numbers).
Tried to replace Pyannote classes with Lhotse's classes, to minimize the code
added to the repo by removing Pyannote imports. Except RTTM writing functions,
mostly replaceable.
Collection: ASR (speaker tasks / diarization, VAD)
Changelog
New: in-tree DER engine (
nemo/collections/asr/metrics/md_eval.py)md-eval-22.pl, written in NeMo style(Apache header, type hints, Google-style docstrings,
__all__,nemo.utils.logging, no CLI). Drives all DER computation.DiarizationErrorResultresult object exposing the dict-like interfaceused throughout NeMo (
abs(result),result['total' | 'confusion' | 'false alarm' | 'missed detection'],result.results_,result.optimal_mapping(...),result.report()).nemo/collections/asr/metrics/der.py(DER public API)score_labels,evaluate_der,score_labels_from_rttm_labels,get_partial_ref_labels,get_online_DER_stats,calculate_session_cpWER,calculate_session_cpWER_bruteforce,concat_perm_word_error_rateareall preserved with their original names, signatures, and return shapes.
No breaking changes for downstream callers.
pyannote.coretypes):make_diar_segment(start, end, speaker, ...)->SupervisionSegmentmake_diar_annotation(labels, uniq_name=...)->list[SupervisionSegment]make_uem_timeline(uem_lines, uniq_id=...)->list[SupervisionSegment](UEM regions carried as supervisions with
speaker="UEM")unique_speakers(annotation)->list[str]write_supervisions_to_rttm(annotation, file_handle, ...)score_labels_from_rttm_labels(...)convenience entry point that takesraw
"start end speaker"label strings (no annotation object constructionrequired by the caller).
_default_uem_from_ref_sys(ref_data, sys_data)helper. When a callerdoes not supply a UEM, the high-level wrappers now auto-derive
[min(ref ∪ sys TBEG), max(ref ∪ sys TEND)]per(file_id, channel)andpass it to
evaluate(). This matches the historical no-UEM scoring mapused by the previous external engine and prevents any over-shoot of the
hypothesis past the last reference segment from being silently dropped.
md_eval.evaluate()itself remains a faithful NIST port (ref-extent only)for power users that call it directly.
collarargument in bothscore_labelsandscore_labels_from_rttm_labelsclarifies the NIST half-width semantics(total no-score zone =
2 * collar) and gives the cross-engine conversionrule (
NeMo collar=X<==> external libs that define collar as total widthcollar=2X).Source code rename / scrub (no behaviour change)
nemo/collections/asr/parts/utils/speaker_utils.py:labels_to_pyannote_object->labels_to_supervisionstimestamps_to_pyannote_object->timestamps_to_supervisionslist[SupervisionSegment]nemo/collections/asr/parts/utils/vad_utils.py:vad_construct_pyannote_object_per_file->vad_construct_supervisions_per_fileframe_vad_construct_pyannote_object_per_file->frame_vad_construct_supervisions_per_fileread_rttm_as_pyannote_object->read_rttm_as_supervisions_DetectionErrorRateAccumulatorclass replacespyannote.metrics.detection.DetectionErrorRate, backed bymd_eval. Itpreserves the
metric(reference, hypothesis)accumulation +metric.report(display=False)API and returns a pandas DataFrame withthe same
('detection error rate', '%'),('false alarm', '%'),('miss', '%')columns that downstream code consumes.scripts/speaker_tasks/eval_diar_with_asr.py:get_pyannote_objs_from_rttms->get_supervisions_from_rttmsexamples/speaker_tasks/diarization/neural_diarizer/e2e_diarize_speech.py:timestamps_to_supervisionsnamepackage by name have been rewritten (or replaced with neutral wording such
as "External Annotation Library") so a
git grep -i pyannoteover thebranch returns zero matches.
tutorials/speaker_tasks/End_to_End_Diarization_*.ipynb,tutorials/tools/Multispeaker_Simulator.ipynb) and the inference notebookupdated to use the new names and
score_labels_from_rttm_labels.Dependencies removed
requirements/requirements_asr.txt: removedpyannote.coreandpyannote.metrics.examples/voice_agent/environment.yaml: removedpyannote-core==5.0.0,pyannote-database==5.1.3,pyannote-metrics==3.2.1.uv.lock: removed the three corresponding[[package]]blocks and everytransitive
{ name = "pyannote-..." }entry. TOML structure validatedafter edit.
Tests
tests/collections/speaker_tasks/utils/test_der.py(119 unit tests)covering:
score_labels_from_rttm_labels(string-label public API)engine implementation (class
TestExternalEngineVerifiedValues)with the string-label path
TestNoUemAutoUnionregression class pinning the auto-UEMbehaviour and the NIST collar semantics with hand-derived expected
values from the diarization tutorial sample
pyannote.core/pyannote.metricssubmodulesare never imported when
der/md_evalare importedtests/collections/{asr,speaker_tasks}/utils/test_vad_utils_*.pyupdatedto use lhotse-based assertions via a new
_annotation_equals(annotation, expected_segments)helper.Usage
The public API is unchanged, so existing user code continues to work. New
shorthand for users that already have RTTM-style label strings:
GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information
Removes a maintenance liability: the previous external diarization metric packages have been on pip with infrequent updates and have pulled in a large transitive closure (pyannote-database, pyannote-pipeline, ...). After this PR, NeMo's DER pipeline depends only on numpy, scipy, lhotse, and editdistance -- all already required.
Backward-compatibility audit: git grep -i pyannote over the branch returns zero matches across Python sources, notebooks, configs, lockfile, docs, and shell scripts. import nemo followed by inspecting sys.modules shows no pyannote.* entries.
Numerical-parity audit: 21 verified-against-the-previous-engine DER values hardcoded in TestExternalEngineVerifiedValues, plus 7 regression tests pinning the auto-UEM and collar semantics with hand-derived expected values from the diarization tutorial sample.