-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
Problem
The abort_callback field in whisper_full_params does not actually interrupt an in-progress transcription. It only fires after each encode/decode step completes, making it ineffective for real-time cancellation. This makes it impossible to build a responsive stop feature in applications that use whisper_full(). It can also create ghost processes in certain applications if we rely purely on operating system's garbage collection. (e.g. macOS)
Root Cause
There are two overloads of the internal ggml_graph_compute_helper:
- The
ggml_cgraph *version (line 169): Correctly accepts and wiresabort_callbackinto ggml backend. - The
abort_callback_sched_tversion (line 191): Has noabort_callbackparameter at all.
All actual encoder and decoder compute calls inside whisper_encode_internaland whisper_decode_internaluse the second (shed) overload exclusively. As a result, abort_callback is never passed to the ggml backend during computation. The only places it fires are the post-hoc checks at the very end of those functions (lines 2447 and 2977), after all the work is already done.
Additionally, the main token sampling loop (whisper_full_with_state, line 6783) has no abort check at all. It runs up to n_text_ctx / 2 iterations with no opportunity to exit early.
encoder_begin_callback does work correctly. It fires before each audio chunk but this only helps with multi-chunk audio. For short clips processed as a single chunk, by the time a user requests a stop, the single chunk is already being processed and encoder_begin_callback will not fire again.
Proposed Fix
I propose 3 changes to whisper.cpp with no API changes:
- Add
abort_callbacksupport to theschedoverload ofggml_graph_compute_helper, ısing the sameggml_backend_set_abort_callbackpattern already present in the non-shed overload. - Pass
abort_callbackandabort_callback_user_datathrough theggml_graph_compute_helper(schedule, ...)calls insidewhisper_encode_internal(lines 2406, 2431, 2447) and insidewhisper_decode_internal(line 2944). Note thatwhisper_decode_internalis called from 4 external sites, but lines 3940 and 8847 already passnullptrand are unrelated to user-initiated abort. - Add an
abort_callbackcheck at the top of the token sampling loop inwhisper_full_with_stateso it can exit between token generations.
Discussion
Before implementing these, I wanted to check whether is this the right layer to fix it, or would you prefer the abort mechanism live deeper in ggml? Also do you have any concerns with the proposed approach?