Skip to content

Remove dependencies to make more light weight#134

Open
BBC-Esq wants to merge 47 commits intoresemble-ai:masterfrom
BBC-Esq:master
Open

Remove dependencies to make more light weight#134
BBC-Esq wants to merge 47 commits intoresemble-ai:masterfrom
BBC-Esq:master

Conversation

@BBC-Esq
Copy link

@BBC-Esq BBC-Esq commented Jun 7, 2025

This PR removes perth, librosa, omegaconf, and conformer as dependencies.

If I did it correctly it supersedes these PRs:

#128
#129
#130
#133

I had a fifth PR that removed diffusers but it was too difficult to implement. A lot of people like myself try to minimize the amount of dependencies so hopefully this helps.

Dependencies like librosa and others have a lot of dependencies in turn...for example, I think that resampy is no longer required either. So overall this PR probably removes at least 10+ total dependencies. I tested and it works fine still.

@Datel777
Copy link

Datel777 commented Nov 8, 2025

This is very good idea to reduce dependencies, especially because torchaudio has almost same methods.
Need to add into pyproject.toml dependencies:

"soundfile",
"gradio",

Unfortunately, running tts conversion with default reference makes errors:

E:\AI\chatterbox-light\venv\lib\site-packages\torchaudio\functional\functional.py:584: UserWarning: At least one mel filterbank has all zero values. The value for `n_mels` (128) may be set too high. Or, the value for `n_freqs` (201) may be set too low.
  warnings.warn(
E:\AI\chatterbox-light\venv\lib\site-packages\diffusers\models\lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
  deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
D:\Python\Python310\lib\contextlib.py:103: FutureWarning: `torch.backends.cuda.sdp_kernel()` is deprecated. In the future, this context manager will be removed. Please see `torch.nn.attention.sdpa_kernel()` for the new context manager, with updated signature.
  self.gen = func(*args, **kwds)
E:\AI\chatterbox-light\venv\lib\site-packages\transformers\utils\generic.py:1006: UserWarning: `output_attentions=True` is not supported with `attn_implementation` other than ['eager', 'eager_paged', 'flex_attention']. Please use `model.set_attn_implementation('eager')` to enable capturing attention outputs.
  warnings.warn(
`sdpa` attention does not support `output_attentions=True` or `head_mask`. Please set your attention to `eager` if you want any of these features.
Traceback (most recent call last):
  File "E:\AI\chatterbox-light\venv\lib\site-packages\gradio\queueing.py", line 759, in process_events
    response = await route_utils.call_process_api(
  File "E:\AI\chatterbox-light\venv\lib\site-packages\gradio\route_utils.py", line 354, in call_process_api
    output = await app.get_blocks().process_api(
  File "E:\AI\chatterbox-light\venv\lib\site-packages\gradio\blocks.py", line 2116, in process_api
    result = await self.call_function(
  File "E:\AI\chatterbox-light\venv\lib\site-packages\gradio\blocks.py", line 1623, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "E:\AI\chatterbox-light\venv\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "E:\AI\chatterbox-light\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 2485, in run_sync_in_worker_thread
    return await future
  File "E:\AI\chatterbox-light\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 976, in run
    result = context.run(func, *args)
  File "E:\AI\chatterbox-light\venv\lib\site-packages\gradio\utils.py", line 915, in wrapper
    response = f(*args, **kwargs)
  File "E:\AI\chatterbox-light\gradio_tts_app.py", line 31, in generate
    wav = model.generate(
  File "E:\AI\chatterbox-light\src\chatterbox\tts.py", line 245, in generate
    speech_tokens = self.t3.inference(
  File "E:\AI\chatterbox-light\venv\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "E:\AI\chatterbox-light\src\chatterbox\models\t3\t3.py", line 318, in inference
    output = self.patched_model(
  File "E:\AI\chatterbox-light\venv\lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "E:\AI\chatterbox-light\venv\lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\AI\chatterbox-light\venv\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "E:\AI\chatterbox-light\src\chatterbox\models\t3\inference\t3_hf_backend.py", line 95, in forward
    tfmr_out = self.model(
  File "E:\AI\chatterbox-light\venv\lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "E:\AI\chatterbox-light\venv\lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\AI\chatterbox-light\venv\lib\site-packages\transformers\utils\generic.py", line 1064, in wrapper
    outputs = func(self, *args, **kwargs)
  File "E:\AI\chatterbox-light\venv\lib\site-packages\transformers\models\llama\modeling_llama.py", line 395, in forward
    hidden_states = decoder_layer(
  File "E:\AI\chatterbox-light\venv\lib\site-packages\transformers\modeling_layers.py", line 94, in __call__
    return super().__call__(*args, **kwargs)
  File "E:\AI\chatterbox-light\venv\lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "E:\AI\chatterbox-light\venv\lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\AI\chatterbox-light\venv\lib\site-packages\transformers\utils\generic.py", line 1023, in wrapped_forward
    output = orig_forward(*args, **kwargs)
  File "E:\AI\chatterbox-light\venv\lib\site-packages\transformers\utils\deprecation.py", line 172, in wrapped_func
    return func(*args, **kwargs)
  File "E:\AI\chatterbox-light\venv\lib\site-packages\transformers\models\llama\modeling_llama.py", line 294, in forward
    hidden_states, _ = self.self_attn(
  File "E:\AI\chatterbox-light\venv\lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "E:\AI\chatterbox-light\venv\lib\site-packages\torch\nn\modules\module.py", line 1857, in _call_impl
    return inner()
  File "E:\AI\chatterbox-light\venv\lib\site-packages\torch\nn\modules\module.py", line 1818, in inner
    hook_result = hook(self, args, result)
  File "E:\AI\chatterbox-light\src\chatterbox\models\t3\inference\alignment_stream_analyzer.py", line 74, in attention_forward_hook
    step_attention = output[1].cpu() # (B, 16, N, N)
AttributeError: 'NoneType' object has no attribute 'cpu'
E:\AI\chatterbox-light\venv\lib\site-packages\torchaudio\functional\functional.py:584: UserWarning: At least one mel filterbank has all zero values. The value for `n_mels` (128) may be set too high. Or, the value for `n_freqs` (201) may be set too low.
  warnings.warn(
E:\AI\chatterbox-light\venv\lib\site-packages\diffusers\models\lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
  deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)

Not sure what to do with this error:

UserWarning: At least one mel filterbank has all zero values. The value for `n_mels` (128) may be set too high. Or, the value for `n_freqs` (201) may be set too low.

I tried to change n_fft, num_mels, sampling_rate, for ta.functional.melscale_fbanks function, but with no luck.

Edit: After some refactoring figured out some version difference, but anyway has different error when replacing librosa to torchaudio

E:\AI\chatterbox-custom\venv\Lib\site-packages\torchaudio\functional\functional.py:584: UserWarning: At least one mel filterbank has all zero values. The value for `n_mels` (128) may be set too high. Or, the value for `n_freqs` (201) may be set too low.
  warnings.warn(
E:\AI\chatterbox-custom\venv\Lib\site-packages\diffusers\models\lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
  deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
WARNING:root:Reference mel length is not equal to 2 * reference token length.

Traceback (most recent call last):
  File "E:\AI\chatterbox-custom\fast_test.py", line 13, in <module>
    model.prepare_conditionals(wav_fpath=AUDIO_PROMPT_PATH)  # Use custom voice from audio sample
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\AI\chatterbox-custom\src\chatterbox\tts.py", line 207, in prepare_conditionals
    ve_embed = torch.from_numpy(self.ve.embeds_from_wavs([ref_16k_wav], sample_rate=S3_SR))
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\AI\chatterbox-custom\src\chatterbox\models\voice_encoder\voice_encoder.py", line 286, in embeds_from_wavs
    mels = [melspectrogram(w, self.hp).T for w in wavs]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\AI\chatterbox-custom\src\chatterbox\models\voice_encoder\voice_encoder.py", line 286, in <listcomp>
    mels = [melspectrogram(w, self.hp).T for w in wavs]
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\AI\chatterbox-custom\src\chatterbox\models\voice_encoder\melspec.py", line 44, in melspectrogram
    spec_complex = _stft(wav, hp, pad=pad)
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\AI\chatterbox-custom\src\chatterbox\models\voice_encoder\melspec.py", line 77, in _stft
    return torch.stft(
           ^^^^^^^^^^^
  File "E:\AI\chatterbox-custom\venv\Lib\site-packages\torch\functional.py", line 728, in stft
    input = F.pad(input.view(extended_shape), [pad, pad], pad_mode)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\AI\chatterbox-custom\venv\Lib\site-packages\torch\nn\functional.py", line 5209, in pad
    return torch._C._nn.pad(input, pad, mode, value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected 2D or 3D (batch mode) tensor with possibly 0 batch size and other non-zero dimensions for input, but got: [1, 1, 0]

@BBC-Esq
Copy link
Author

BBC-Esq commented Dec 25, 2025

@Datel777 I will check that out and let you know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants