[Vulkan] Beam size 8 crashes with AMD Radeon 780M when VAD is enabled (large-v2)

**Issue Description**  
When using the Vulkan backend with an AMD Radeon 780M integrated GPU and the `ggml-large-v2.bin` model, enabling VAD (voice activity detection) and setting `--beam-size` to 8 causes a segmentation fault after processing some speech segments.  

If VAD is disabled (i.e., processing the whole audio as one segment), `--beam-size 8` runs stably.  
If `--beam-size` is reduced to 5 or lower, the program runs stably regardless of whether VAD is enabled.  
`--beam-size 7` may also crash under certain parameter combinations, but the occurrence is inconsistent.

---

**Environment**  
- OS: Windows 10 LTSC IoT Enterprise 64-bit (19044.6937)  
- Hardware: AMD Ryzen 8845HS (Radeon 780M), 16 GB shared memory  
- Driver: AMD Vulkan driver (amdvlk64.dll) installed via Vulkan SDK 1.4.341  
- Build environment: Visual Studio 2022 (MSVC), Vulkan SDK configured  
- whisper.cpp version: master branch, commit 76684141 (approximately January 2025)  
- VAD models tested: both ONNX format (`silero_vad.onnx`) and GGML format (`for-tests-silero-v6.2.0-ggml.bin`) – both trigger the same crash.

---

**Steps to Reproduce**  
1. Run `whisper-cli.exe` (Debug build) with the following parameters:  
   ```bash
   whisper-cli.exe -m models/ggml-large-v2.bin -f "audio.wav" -l zh -t 8 --beam-size 8 --max-context 128 --max-len 150 --suppress-nst --no-flash-attn --vad -vm models/for-tests-silero-v6.2.0-ggml.bin --vad-min-speech-duration-ms 500 --vad-min-silence-duration-ms 300 --vad-speech-pad-ms 100
   ```
2. The program starts processing, VAD splits the audio into ~200+ speech segments.  
3. After transcribing the first ~10–15 segments, a segmentation fault occurs.

---

**Observed Behavior & Debug Stack**  
When attached with Visual Studio 2022 in Debug mode, the exception is:  
```
Exception thrown at 0x00007FFD8340B734 (amdvlk64.dll) in whisper-cli.exe: 0xC0000005: Access violation reading location 0x0000000000000010.
```
The call stack points to a failure during `vkCreateComputePipelines` inside the `ggml-vulkan` component:  
```
amdvlk64.dll!00007ffd8340b734()
...
ggml-vulkan.dll!vk::Device::createComputePipeline<...>()
ggml-vulkan.dll!ggml_vk_create_pipeline_func(...)
ggml-vulkan.dll!ggml_vk_load_shaders(...)
ggml-vulkan.dll!ggml_vk_mul_mat_vec_q_f16(...)
...
```
Full stack trace is attached.

---

**Key Observations**  
- Without VAD (single‑segment processing), `--beam-size 8` works reliably.  
- When VAD is enabled, even increasing `--vad-min-speech-duration-ms` (e.g., to 1000 ms) to reduce the number of segments does not prevent the crash with `--beam-size 8`.  
- `--beam-size 7` is also unstable in some tests, though the crash is less frequent.  
- `--beam-size 5` or lower is stable in all tests.  
- The crash occurs regardless of whether the ONNX or GGML VAD model is used.

---

**Speculated Cause**  
- The AMD Vulkan driver contains a bug that manifests when creating compute pipelines of a certain shape. This shape appears to be triggered by the combination of `--beam-size 8` and the multiple pipeline creations caused by VAD segmentation.  
- The repeated pipeline creation may expose a driver‑side issue leading to an invalid memory access.  
- It is suspected that the tensor shapes or workgroup configurations required for `--beam-size 8` cause the driver to access uninitialized or out‑of‑bounds memory.

---

**Workaround**  
Set `--beam-size` to 6 or lower(for me) while keeping VAD and other optimization parameters. For example:  
```bash
whisper-cli.exe -m models/ggml-large-v2.bin -f "audio.wav" -l zh -t 8 --beam-size 6 --max-context 128 --max-len 150 --suppress-nst --vad -vm models/for-tests-silero-v6.2.0-ggml.bin
```
This configuration runs stably with good transcription quality.

---

**Request**  
I hope the developers can look into the compatibility issue with the AMD Vulkan driver under these specific conditions. Perhaps the Vulkan backend could be adjusted to avoid the problematic pipeline creation pattern, or a driver‑specific workaround (e.g., limiting `--beam-size` or changing memory layout) could be introduced.

[DxDiag.txt](https://github.com/user-attachments/files/26264857/DxDiag.txt)

[stack.txt](https://github.com/user-attachments/files/26264871/stack.txt)

[whisper-cli-mini.dmp](https://github.com/user-attachments/files/26264872/whisper-cli-mini.dmp)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Vulkan] Beam size 8 crashes with AMD Radeon 780M when VAD is enabled (large-v2) #3723

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Vulkan] Beam size 8 crashes with AMD Radeon 780M when VAD is enabled (large-v2) #3723

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions