Skip to content

[Vulkan] Beam size 8 crashes with AMD Radeon 780M when VAD is enabled (large-v2) #3723

@controlledentropy

Description

@controlledentropy

Issue Description
When using the Vulkan backend with an AMD Radeon 780M integrated GPU and the ggml-large-v2.bin model, enabling VAD (voice activity detection) and setting --beam-size to 8 causes a segmentation fault after processing some speech segments.

If VAD is disabled (i.e., processing the whole audio as one segment), --beam-size 8 runs stably.
If --beam-size is reduced to 5 or lower, the program runs stably regardless of whether VAD is enabled.
--beam-size 7 may also crash under certain parameter combinations, but the occurrence is inconsistent.


Environment

  • OS: Windows 10 LTSC IoT Enterprise 64-bit (19044.6937)
  • Hardware: AMD Ryzen 8845HS (Radeon 780M), 16 GB shared memory
  • Driver: AMD Vulkan driver (amdvlk64.dll) installed via Vulkan SDK 1.4.341
  • Build environment: Visual Studio 2022 (MSVC), Vulkan SDK configured
  • whisper.cpp version: master branch, commit 7668414 (approximately January 2025)
  • VAD models tested: both ONNX format (silero_vad.onnx) and GGML format (for-tests-silero-v6.2.0-ggml.bin) – both trigger the same crash.

Steps to Reproduce

  1. Run whisper-cli.exe (Debug build) with the following parameters:
    whisper-cli.exe -m models/ggml-large-v2.bin -f "audio.wav" -l zh -t 8 --beam-size 8 --max-context 128 --max-len 150 --suppress-nst --no-flash-attn --vad -vm models/for-tests-silero-v6.2.0-ggml.bin --vad-min-speech-duration-ms 500 --vad-min-silence-duration-ms 300 --vad-speech-pad-ms 100
  2. The program starts processing, VAD splits the audio into ~200+ speech segments.
  3. After transcribing the first ~10–15 segments, a segmentation fault occurs.

Observed Behavior & Debug Stack
When attached with Visual Studio 2022 in Debug mode, the exception is:

Exception thrown at 0x00007FFD8340B734 (amdvlk64.dll) in whisper-cli.exe: 0xC0000005: Access violation reading location 0x0000000000000010.

The call stack points to a failure during vkCreateComputePipelines inside the ggml-vulkan component:

amdvlk64.dll!00007ffd8340b734()
...
ggml-vulkan.dll!vk::Device::createComputePipeline<...>()
ggml-vulkan.dll!ggml_vk_create_pipeline_func(...)
ggml-vulkan.dll!ggml_vk_load_shaders(...)
ggml-vulkan.dll!ggml_vk_mul_mat_vec_q_f16(...)
...

Full stack trace is attached.


Key Observations

  • Without VAD (single‑segment processing), --beam-size 8 works reliably.
  • When VAD is enabled, even increasing --vad-min-speech-duration-ms (e.g., to 1000 ms) to reduce the number of segments does not prevent the crash with --beam-size 8.
  • --beam-size 7 is also unstable in some tests, though the crash is less frequent.
  • --beam-size 5 or lower is stable in all tests.
  • The crash occurs regardless of whether the ONNX or GGML VAD model is used.

Speculated Cause

  • The AMD Vulkan driver contains a bug that manifests when creating compute pipelines of a certain shape. This shape appears to be triggered by the combination of --beam-size 8 and the multiple pipeline creations caused by VAD segmentation.
  • The repeated pipeline creation may expose a driver‑side issue leading to an invalid memory access.
  • It is suspected that the tensor shapes or workgroup configurations required for --beam-size 8 cause the driver to access uninitialized or out‑of‑bounds memory.

Workaround
Set --beam-size to 6 or lower(for me) while keeping VAD and other optimization parameters. For example:

whisper-cli.exe -m models/ggml-large-v2.bin -f "audio.wav" -l zh -t 8 --beam-size 6 --max-context 128 --max-len 150 --suppress-nst --vad -vm models/for-tests-silero-v6.2.0-ggml.bin

This configuration runs stably with good transcription quality.


Request
I hope the developers can look into the compatibility issue with the AMD Vulkan driver under these specific conditions. Perhaps the Vulkan backend could be adjusted to avoid the problematic pipeline creation pattern, or a driver‑specific workaround (e.g., limiting --beam-size or changing memory layout) could be introduced.

DxDiag.txt

stack.txt

whisper-cli-mini.dmp

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions