Remove dependencies to make more light weight by BBC-Esq · Pull Request #134 · resemble-ai/chatterbox

BBC-Esq · 2025-06-07T18:50:15Z

This PR removes perth, librosa, omegaconf, and conformer as dependencies.

If I did it correctly it supersedes these PRs:

I had a fifth PR that removed diffusers but it was too difficult to implement. A lot of people like myself try to minimize the amount of dependencies so hopefully this helps.

Dependencies like librosa and others have a lot of dependencies in turn...for example, I think that resampy is no longer required either. So overall this PR probably removes at least 10+ total dependencies. I tested and it works fine still.

Remove librosa

Remove omegaconf

Remove conformer

Datel777 · 2025-11-08T02:52:55Z

This is very good idea to reduce dependencies, especially because torchaudio has almost same methods.
Need to add into pyproject.toml dependencies:

"soundfile",
"gradio",

Unfortunately, running tts conversion with default reference makes errors:

E:\AI\chatterbox-light\venv\lib\site-packages\torchaudio\functional\functional.py:584: UserWarning: At least one mel filterbank has all zero values. The value for `n_mels` (128) may be set too high. Or, the value for `n_freqs` (201) may be set too low.
  warnings.warn(
E:\AI\chatterbox-light\venv\lib\site-packages\diffusers\models\lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
  deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
D:\Python\Python310\lib\contextlib.py:103: FutureWarning: `torch.backends.cuda.sdp_kernel()` is deprecated. In the future, this context manager will be removed. Please see `torch.nn.attention.sdpa_kernel()` for the new context manager, with updated signature.
  self.gen = func(*args, **kwds)
E:\AI\chatterbox-light\venv\lib\site-packages\transformers\utils\generic.py:1006: UserWarning: `output_attentions=True` is not supported with `attn_implementation` other than ['eager', 'eager_paged', 'flex_attention']. Please use `model.set_attn_implementation('eager')` to enable capturing attention outputs.
  warnings.warn(
`sdpa` attention does not support `output_attentions=True` or `head_mask`. Please set your attention to `eager` if you want any of these features.
Traceback (most recent call last):
  File "E:\AI\chatterbox-light\venv\lib\site-packages\gradio\queueing.py", line 759, in process_events
    response = await route_utils.call_process_api(
  File "E:\AI\chatterbox-light\venv\lib\site-packages\gradio\route_utils.py", line 354, in call_process_api
    output = await app.get_blocks().process_api(
  File "E:\AI\chatterbox-light\venv\lib\site-packages\gradio\blocks.py", line 2116, in process_api
    result = await self.call_function(
  File "E:\AI\chatterbox-light\venv\lib\site-packages\gradio\blocks.py", line 1623, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "E:\AI\chatterbox-light\venv\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "E:\AI\chatterbox-light\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 2485, in run_sync_in_worker_thread
    return await future
  File "E:\AI\chatterbox-light\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 976, in run
    result = context.run(func, *args)
  File "E:\AI\chatterbox-light\venv\lib\site-packages\gradio\utils.py", line 915, in wrapper
    response = f(*args, **kwargs)
  File "E:\AI\chatterbox-light\gradio_tts_app.py", line 31, in generate
    wav = model.generate(
  File "E:\AI\chatterbox-light\src\chatterbox\tts.py", line 245, in generate
    speech_tokens = self.t3.inference(
  File "E:\AI\chatterbox-light\venv\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "E:\AI\chatterbox-light\src\chatterbox\models\t3\t3.py", line 318, in inference
    output = self.patched_model(
  File "E:\AI\chatterbox-light\venv\lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "E:\AI\chatterbox-light\venv\lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\AI\chatterbox-light\venv\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "E:\AI\chatterbox-light\src\chatterbox\models\t3\inference\t3_hf_backend.py", line 95, in forward
    tfmr_out = self.model(
  File "E:\AI\chatterbox-light\venv\lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "E:\AI\chatterbox-light\venv\lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\AI\chatterbox-light\venv\lib\site-packages\transformers\utils\generic.py", line 1064, in wrapper
    outputs = func(self, *args, **kwargs)
  File "E:\AI\chatterbox-light\venv\lib\site-packages\transformers\models\llama\modeling_llama.py", line 395, in forward
    hidden_states = decoder_layer(
  File "E:\AI\chatterbox-light\venv\lib\site-packages\transformers\modeling_layers.py", line 94, in __call__
    return super().__call__(*args, **kwargs)
  File "E:\AI\chatterbox-light\venv\lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "E:\AI\chatterbox-light\venv\lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\AI\chatterbox-light\venv\lib\site-packages\transformers\utils\generic.py", line 1023, in wrapped_forward
    output = orig_forward(*args, **kwargs)
  File "E:\AI\chatterbox-light\venv\lib\site-packages\transformers\utils\deprecation.py", line 172, in wrapped_func
    return func(*args, **kwargs)
  File "E:\AI\chatterbox-light\venv\lib\site-packages\transformers\models\llama\modeling_llama.py", line 294, in forward
    hidden_states, _ = self.self_attn(
  File "E:\AI\chatterbox-light\venv\lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "E:\AI\chatterbox-light\venv\lib\site-packages\torch\nn\modules\module.py", line 1857, in _call_impl
    return inner()
  File "E:\AI\chatterbox-light\venv\lib\site-packages\torch\nn\modules\module.py", line 1818, in inner
    hook_result = hook(self, args, result)
  File "E:\AI\chatterbox-light\src\chatterbox\models\t3\inference\alignment_stream_analyzer.py", line 74, in attention_forward_hook
    step_attention = output[1].cpu() # (B, 16, N, N)
AttributeError: 'NoneType' object has no attribute 'cpu'
E:\AI\chatterbox-light\venv\lib\site-packages\torchaudio\functional\functional.py:584: UserWarning: At least one mel filterbank has all zero values. The value for `n_mels` (128) may be set too high. Or, the value for `n_freqs` (201) may be set too low.
  warnings.warn(
E:\AI\chatterbox-light\venv\lib\site-packages\diffusers\models\lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
  deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)

Not sure what to do with this error:

UserWarning: At least one mel filterbank has all zero values. The value for `n_mels` (128) may be set too high. Or, the value for `n_freqs` (201) may be set too low.

I tried to change n_fft, num_mels, sampling_rate, for ta.functional.melscale_fbanks function, but with no luck.

Edit: After some refactoring figured out some version difference, but anyway has different error when replacing librosa to torchaudio

E:\AI\chatterbox-custom\venv\Lib\site-packages\torchaudio\functional\functional.py:584: UserWarning: At least one mel filterbank has all zero values. The value for `n_mels` (128) may be set too high. Or, the value for `n_freqs` (201) may be set too low.
  warnings.warn(
E:\AI\chatterbox-custom\venv\Lib\site-packages\diffusers\models\lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
  deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
WARNING:root:Reference mel length is not equal to 2 * reference token length.

Traceback (most recent call last):
  File "E:\AI\chatterbox-custom\fast_test.py", line 13, in <module>
    model.prepare_conditionals(wav_fpath=AUDIO_PROMPT_PATH)  # Use custom voice from audio sample
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\AI\chatterbox-custom\src\chatterbox\tts.py", line 207, in prepare_conditionals
    ve_embed = torch.from_numpy(self.ve.embeds_from_wavs([ref_16k_wav], sample_rate=S3_SR))
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\AI\chatterbox-custom\src\chatterbox\models\voice_encoder\voice_encoder.py", line 286, in embeds_from_wavs
    mels = [melspectrogram(w, self.hp).T for w in wavs]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\AI\chatterbox-custom\src\chatterbox\models\voice_encoder\voice_encoder.py", line 286, in <listcomp>
    mels = [melspectrogram(w, self.hp).T for w in wavs]
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\AI\chatterbox-custom\src\chatterbox\models\voice_encoder\melspec.py", line 44, in melspectrogram
    spec_complex = _stft(wav, hp, pad=pad)
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\AI\chatterbox-custom\src\chatterbox\models\voice_encoder\melspec.py", line 77, in _stft
    return torch.stft(
           ^^^^^^^^^^^
  File "E:\AI\chatterbox-custom\venv\Lib\site-packages\torch\functional.py", line 728, in stft
    input = F.pad(input.view(extended_shape), [pad, pad], pad_mode)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\AI\chatterbox-custom\venv\Lib\site-packages\torch\nn\functional.py", line 5209, in pad
    return torch._C._nn.pad(input, pad, mode, value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected 2D or 3D (batch mode) tensor with possibly 0 batch size and other non-zero dimensions for input, but got: [1, 1, 0]

BBC-Esq · 2025-12-25T15:48:06Z

@Datel777 I will check that out and let you know.

Updated project version, Python requirements, and dependencies.

Added comments for clarity and updated code for better readability.

Refactor audio loading and processing to use torchaudio instead of librosa. Update punctuation normalization and condition preparation methods.

Updated README to include new soundfile library and installation instructions.

BBC-Esq added 23 commits June 7, 2025 11:21

Update voice_encoder.py

839539c

Update tts.py

3ab1a93

Update s3tokenizer.py

cf207d7

Update vc.py

c475c4a

Update melspec.py

003daa8

Update s3gen.py

1e37b35

Update flow.py

65b718d

Update flow_matching.py

530e9d0

Add files via upload

6a48028

Update decoder.py

7c637d3

Update mel.py

7c0f81f

Update s3tokenizer.py

a1127fa

Update mel.py

6f8ec3d

Update melspec.py

f31505c

Merge pull request #1 from BBC-Esq/remove_librosa

ce3328d

Remove librosa

Merge pull request #2 from BBC-Esq/remove_omegaconf

9ea12b2

Remove omegaconf

Merge pull request #3 from BBC-Esq/remove_conformer

9579d3d

Remove conformer

Update tts.py

5a0cd8e

Update vc.py

af26feb

Update pyproject.toml

8dda84d

Update pyproject.toml

740477a

Update pyproject.toml

f3a7591

Update README.md

8b5ff4f

BBC-Esq added 5 commits December 25, 2025 12:41

Add files via upload

331f1c7

Add files via upload

937d58b

Delete src/chatterbox/models/s3gen/matcha/text_encoder.py

f06c013

Add files via upload

420d370

Add files via upload

dd05c5c

BBC-Esq added 19 commits December 25, 2025 12:44

Add files via upload

4d675c0

Add files via upload

eaab947

Add files via upload

be8d54f

Add files via upload

d359d0f

Add files via upload

ed48bfc

Add files via upload

6789be7

Add files via upload

b636efc

Add files via upload

7bc80a6

Bump version to 0.1.6 and update dependencies

27fe97f

Updated project version, Python requirements, and dependencies.

Ensure .wav files are ignored in all directories

777c377

Enhance code readability with comments and formatting

aeb070b

Added comments for clarity and updated code for better readability.

Refactor audio handling and punctuation normalization

acc231a

Refactor audio loading and processing to use torchaudio instead of librosa. Update punctuation normalization and condition preparation methods.

Add files via upload

f41d8ec

Add files via upload

66511f7

Add files via upload

d47f98f

Add files via upload

ad14cb2

Add files via upload

c330f52

Revise README with soundfile addition and usage details

6bd6832

Updated README to include new soundfile library and installation instructions.

Add files via upload

0e6a227

hoh mentioned this pull request Feb 21, 2026

Dependency conformer is unmaintained, source repository has disappeared #464

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove dependencies to make more light weight#134

Remove dependencies to make more light weight#134
BBC-Esq wants to merge 47 commits intoresemble-ai:masterfrom
BBC-Esq:master

BBC-Esq commented Jun 7, 2025 •

edited

Loading

Uh oh!

Datel777 commented Nov 8, 2025 •

edited

Loading

Uh oh!

BBC-Esq commented Dec 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BBC-Esq commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Datel777 commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BBC-Esq commented Dec 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BBC-Esq commented Jun 7, 2025 •

edited

Loading

Datel777 commented Nov 8, 2025 •

edited

Loading