fix(voice): prevent scheduling deadlock when SpeechHandle._markDone re-enters#1422
Closed
toubatbrian wants to merge 1 commit intomainfrom
Closed
fix(voice): prevent scheduling deadlock when SpeechHandle._markDone re-enters#1422toubatbrian wants to merge 1 commit intomainfrom
toubatbrian wants to merge 1 commit intomainfrom
Conversation
Move the `_markGenerationDone()` call in `SpeechHandle._markDone()` outside the `if (!doneFut.done)` guard so a pending generation future is always resolved, even when `doneFut` was already settled. Previously, a second `_markDone()` would short-circuit and leave the generation unresolved, causing `mainTask` to hang on `_waitForGeneration()` and starve subsequent speech handles. Adds a regression test (`SpeechHandle._markDone - generation resolution after early done`) that fails on the pre-fix code (timeout) and passes on the fix. Ports livekit/agents#5678. https://claude.ai/code/session_01NQyfoU5f21EryyRbE98AE5
🦋 Changeset detectedLatest commit: fe0d991 The changes in this PR will be included in the next version bump. This PR includes changesets to release 31 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Automated port of livekit/agents#5678 (
fix(voice): prevent scheduling deadlock when pipeline task crashes) intoagents-js.Note
This is an automated Claude Code Routine created by @toubatbrian. Right now it is in experimentation stage.
cc @toubatbrian @livekit/agent-devs for review.
What was broken
SpeechHandle._markDone()short-circuits whendoneFutis already resolved, but the call to_markGenerationDone()was nested inside the sameif (!doneFut.done)guard:This meant that if
_markDone()ran a second time afterdoneFutwas already settled — e.g. the force-interrupt shutdown path inAgentActivity.interrupt()runs first, and the pipeline reply task later tries to clean up — and a generation had been authorized in the interim, that generation future was never resolved.mainTaskthen hangs forever on_waitForGeneration(), starving every subsequent speech handle (a real production failure mode that already had a workaround comment atagent_activity.ts:1709-1712: "the generation future created by_authorizeGenerationwould never resolve since_markDoneis a no-op oncedoneFutis already settled").This is the JS twin of the Python bug: in Python the equivalent issue surfaced through
contextlib.suppress(asyncio.InvalidStateError)swallowing_done_fut.set_result(None)and aborting the whole block; here it's the explicitif (!doneFut.done)guard. Same root cause, different idiom.Fix
Move
_markGenerationDone()out of theif (!doneFut.done)guard so it always runs when a generation is pending:_markGenerationDone()is already idempotent (it only resolves the last generation if!lastGeneration.done), so calling it on a re-entry where the generation was already resolved is a no-op.Implementation nuances vs. Python
The Python diff (6 / -3 in
speech_handle.py) also re-orders the_interrupt_timeout_handle.cancel()call relative to the suppressed block. The JSSpeechHandledoesn't carry an_interruptTimeoutHandlefield at all (no equivalent timeout-driven cancellation logic exists yet in agents-js), so that part of the Python change has no JS counterpart and is intentionally not ported. The behavioral fix — always run_markGenerationDone()when there's a pending generation — is identical in both languages.The Python PR did not include a regression test. Because the JS bug pattern (re-entering
_markDoneafterdoneFutis settled, then authorizing/finishing a generation) is easy to reproduce deterministically with the existing public-internal API, this PR adds a single regression test inagents/src/voice/speech_handle.test.ts:Verified: this test times out on the pre-fix code (
Expected: "resolved", Received: "timeout") and passes on the fix.Files changed
agents/src/voice/speech_handle.ts— move_markGenerationDone()out of theif (!doneFut.done)guard.agents/src/voice/speech_handle.test.ts— addSpeechHandle._markDone - generation resolution after early doneregression suite..changeset/fix-speech-handle-mark-done-deadlock.md—patchfor@livekit/agents.Test plan
pnpm --filter @livekit/agents build— passes.pnpm exec prettier --checkon changed files — passes.pnpm --filter @livekit/agents lint— no new errors / warnings on touched files.pnpm exec vitest run agents/src/voice/speech_handle.test.ts— 10/10 pass (9 existing + 1 new).pnpm exec vitest run agents/src/voice/— 169/169 pass (full voice suite green).speech_handle.tschange makes the new test fail with a 500 ms timeout, then re-applying restores it to green.Changeset
patchfor@livekit/agents(per the routine's standing instructions).Generated by Claude Code