Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/source/audio/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,11 @@ Processing Models
:members:
:exclude-members: setup_training_data, setup_validation_data, training_step, on_validation_epoch_end, validation_step, setup_test_data, on_train_epoch_start

.. autoclass:: nemo.collections.audio.models.BNR2
:show-inheritance:
:members:
:exclude-members: setup_training_data, setup_validation_data, training_step, on_validation_epoch_end, validation_step, setup_test_data, on_train_epoch_start


Modules
-------
Expand Down
5 changes: 5 additions & 0 deletions docs/source/audio/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,11 @@ Flow Matching Model
Flow matching model is a generative model using a noise-to-data process to transform the input (degraded) audio signal into the target (clean) audio signal :cite:`audio-models-ku2024generative`. The model consists of an encoder and decoder, a neural estimator, a flow model and a sampler.


Background Noise Removal (BNR) Model
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Background Noise Removal (BNR) 2.0 is a single-channel speech denoising model that uses the SEASR architecture. It combines convolutional feature extraction with GRU-based temporal modeling and a learnable masking mechanism operating in a learned transform domain. See `SEASR: A Speech Enhancement Model Using the SAEV Representation <https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10837982>`_ for details.


References
----------

Expand Down
2 changes: 2 additions & 0 deletions nemo/collections/audio/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,11 @@
SchroedingerBridgeAudioToAudioModel,
ScoreBasedGenerativeAudioToAudioModel,
)
from nemo.collections.audio.models.maxine.bnr import BNR2

__all__ = [
"AudioToAudioModel",
"BNR2",
"EncMaskDecAudioToAudioModel",
"FlowMatchingAudioToAudioModel",
"PredictiveAudioToAudioModel",
Expand Down
15 changes: 14 additions & 1 deletion nemo/collections/audio/models/maxine/bnr.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,20 @@ def _remove_weight_norm(m):


class BNR2(AudioToAudioModel):
"""Implementation of the BNR 2 model"""
"""Maxine Background Noise Removal (BNR) 2.0 model.

BNR 2.0 is a single-channel speech denoising model that removes background
noise from audio to improve speech intelligibility and downstream ASR accuracy.
It uses the SEASR architecture, which combines convolutional feature extraction
with GRU-based temporal modeling and a learnable masking mechanism operating
in a learned transform domain.

The model operates at 16 kHz sample rate.

Reference:
`SEASR: A Speech Enhancement Model Using the SAEV Representation
<https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10837982>`_
"""

def __init__(self, cfg: DictConfig, trainer: Trainer = None):
self.world_size = 1
Expand Down
Loading