-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
Issue Description
When using the Vulkan backend with an AMD Radeon 780M integrated GPU and the ggml-large-v2.bin model, enabling VAD (voice activity detection) and setting --beam-size to 8 causes a segmentation fault after processing some speech segments.
If VAD is disabled (i.e., processing the whole audio as one segment), --beam-size 8 runs stably.
If --beam-size is reduced to 5 or lower, the program runs stably regardless of whether VAD is enabled.
--beam-size 7 may also crash under certain parameter combinations, but the occurrence is inconsistent.
Environment
- OS: Windows 10 LTSC IoT Enterprise 64-bit (19044.6937)
- Hardware: AMD Ryzen 8845HS (Radeon 780M), 16 GB shared memory
- Driver: AMD Vulkan driver (amdvlk64.dll) installed via Vulkan SDK 1.4.341
- Build environment: Visual Studio 2022 (MSVC), Vulkan SDK configured
- whisper.cpp version: master branch, commit 7668414 (approximately January 2025)
- VAD models tested: both ONNX format (
silero_vad.onnx) and GGML format (for-tests-silero-v6.2.0-ggml.bin) – both trigger the same crash.
Steps to Reproduce
- Run
whisper-cli.exe(Debug build) with the following parameters:whisper-cli.exe -m models/ggml-large-v2.bin -f "audio.wav" -l zh -t 8 --beam-size 8 --max-context 128 --max-len 150 --suppress-nst --no-flash-attn --vad -vm models/for-tests-silero-v6.2.0-ggml.bin --vad-min-speech-duration-ms 500 --vad-min-silence-duration-ms 300 --vad-speech-pad-ms 100 - The program starts processing, VAD splits the audio into ~200+ speech segments.
- After transcribing the first ~10–15 segments, a segmentation fault occurs.
Observed Behavior & Debug Stack
When attached with Visual Studio 2022 in Debug mode, the exception is:
Exception thrown at 0x00007FFD8340B734 (amdvlk64.dll) in whisper-cli.exe: 0xC0000005: Access violation reading location 0x0000000000000010.
The call stack points to a failure during vkCreateComputePipelines inside the ggml-vulkan component:
amdvlk64.dll!00007ffd8340b734()
...
ggml-vulkan.dll!vk::Device::createComputePipeline<...>()
ggml-vulkan.dll!ggml_vk_create_pipeline_func(...)
ggml-vulkan.dll!ggml_vk_load_shaders(...)
ggml-vulkan.dll!ggml_vk_mul_mat_vec_q_f16(...)
...
Full stack trace is attached.
Key Observations
- Without VAD (single‑segment processing),
--beam-size 8works reliably. - When VAD is enabled, even increasing
--vad-min-speech-duration-ms(e.g., to 1000 ms) to reduce the number of segments does not prevent the crash with--beam-size 8. --beam-size 7is also unstable in some tests, though the crash is less frequent.--beam-size 5or lower is stable in all tests.- The crash occurs regardless of whether the ONNX or GGML VAD model is used.
Speculated Cause
- The AMD Vulkan driver contains a bug that manifests when creating compute pipelines of a certain shape. This shape appears to be triggered by the combination of
--beam-size 8and the multiple pipeline creations caused by VAD segmentation. - The repeated pipeline creation may expose a driver‑side issue leading to an invalid memory access.
- It is suspected that the tensor shapes or workgroup configurations required for
--beam-size 8cause the driver to access uninitialized or out‑of‑bounds memory.
Workaround
Set --beam-size to 6 or lower(for me) while keeping VAD and other optimization parameters. For example:
whisper-cli.exe -m models/ggml-large-v2.bin -f "audio.wav" -l zh -t 8 --beam-size 6 --max-context 128 --max-len 150 --suppress-nst --vad -vm models/for-tests-silero-v6.2.0-ggml.binThis configuration runs stably with good transcription quality.
Request
I hope the developers can look into the compatibility issue with the AMD Vulkan driver under these specific conditions. Perhaps the Vulkan backend could be adjusted to avoid the problematic pipeline creation pattern, or a driver‑specific workaround (e.g., limiting --beam-size or changing memory layout) could be introduced.