Skip to content

fix(voice): prevent scheduling deadlock when SpeechHandle._markDone re-enters#1422

Closed
toubatbrian wants to merge 1 commit intomainfrom
claude/quirky-galileo-viXBX
Closed

fix(voice): prevent scheduling deadlock when SpeechHandle._markDone re-enters#1422
toubatbrian wants to merge 1 commit intomainfrom
claude/quirky-galileo-viXBX

Conversation

@toubatbrian
Copy link
Copy Markdown
Contributor

Summary

Automated port of livekit/agents#5678 (fix(voice): prevent scheduling deadlock when pipeline task crashes) into agents-js.

Note

This is an automated Claude Code Routine created by @toubatbrian. Right now it is in experimentation stage.

cc @toubatbrian @livekit/agent-devs for review.

What was broken

SpeechHandle._markDone() short-circuits when doneFut is already resolved, but the call to _markGenerationDone() was nested inside the same if (!doneFut.done) guard:

// Pre-fix (agents/src/voice/speech_handle.ts)
_markDone(): void {
  if (!this.doneFut.done) {
    this.doneFut.resolve();
    if (this.generations.length > 0) {
      this._markGenerationDone(); // preemptive generation could be cancelled before being scheduled
    }
  }
}

This meant that if _markDone() ran a second time after doneFut was already settled — e.g. the force-interrupt shutdown path in AgentActivity.interrupt() runs first, and the pipeline reply task later tries to clean up — and a generation had been authorized in the interim, that generation future was never resolved. mainTask then hangs forever on _waitForGeneration(), starving every subsequent speech handle (a real production failure mode that already had a workaround comment at agent_activity.ts:1709-1712: "the generation future created by _authorizeGeneration would never resolve since _markDone is a no-op once doneFut is already settled").

This is the JS twin of the Python bug: in Python the equivalent issue surfaced through contextlib.suppress(asyncio.InvalidStateError) swallowing _done_fut.set_result(None) and aborting the whole block; here it's the explicit if (!doneFut.done) guard. Same root cause, different idiom.

Fix

Move _markGenerationDone() out of the if (!doneFut.done) guard so it always runs when a generation is pending:

_markDone(): void {
  if (!this.doneFut.done) {
    this.doneFut.resolve();
  }

  // Ref: python livekit-agents/livekit/agents/voice/speech_handle.py - _mark_done
  // Must be outside the doneFut.done guard: if doneFut is already resolved
  // (e.g. interrupted before _markDone is called), the guarded block is
  // skipped and the generation future would be left unresolved, leaving
  // _waitForGeneration stuck and starving subsequent speech handles.
  if (this.generations.length > 0) {
    this._markGenerationDone();
  }
}

_markGenerationDone() is already idempotent (it only resolves the last generation if !lastGeneration.done), so calling it on a re-entry where the generation was already resolved is a no-op.

Implementation nuances vs. Python

The Python diff (6 / -3 in speech_handle.py) also re-orders the _interrupt_timeout_handle.cancel() call relative to the suppressed block. The JS SpeechHandle doesn't carry an _interruptTimeoutHandle field at all (no equivalent timeout-driven cancellation logic exists yet in agents-js), so that part of the Python change has no JS counterpart and is intentionally not ported. The behavioral fix — always run _markGenerationDone() when there's a pending generation — is identical in both languages.

The Python PR did not include a regression test. Because the JS bug pattern (re-entering _markDone after doneFut is settled, then authorizing/finishing a generation) is easy to reproduce deterministically with the existing public-internal API, this PR adds a single regression test in agents/src/voice/speech_handle.test.ts:

it('resolves a pending generation when called after doneFut is already done', async () => {
  const handle = SpeechHandle.create();
  handle._markDone();                // resolves doneFut (no generations yet)
  handle._authorizeGeneration();     // generation authorized after handle is "done"
  handle._markDone();                // pre-fix: short-circuit, generation never resolves

  const outcome = await raceTimeout(handle._waitForGeneration(), 500);
  expect(outcome).toBe('resolved');
});

Verified: this test times out on the pre-fix code (Expected: "resolved", Received: "timeout") and passes on the fix.

Files changed

  • agents/src/voice/speech_handle.ts — move _markGenerationDone() out of the if (!doneFut.done) guard.
  • agents/src/voice/speech_handle.test.ts — add SpeechHandle._markDone - generation resolution after early done regression suite.
  • .changeset/fix-speech-handle-mark-done-deadlock.mdpatch for @livekit/agents.

Test plan

  • pnpm --filter @livekit/agents build — passes.
  • pnpm exec prettier --check on changed files — passes.
  • pnpm --filter @livekit/agents lint — no new errors / warnings on touched files.
  • pnpm exec vitest run agents/src/voice/speech_handle.test.ts — 10/10 pass (9 existing + 1 new).
  • pnpm exec vitest run agents/src/voice/ — 169/169 pass (full voice suite green).
  • Confirmed regression: reverting the speech_handle.ts change makes the new test fail with a 500 ms timeout, then re-applying restores it to green.
  • Manual smoke against a force-interrupt shutdown (left to reviewer with a real session).

Changeset

patch for @livekit/agents (per the routine's standing instructions).


Generated by Claude Code

Move the `_markGenerationDone()` call in `SpeechHandle._markDone()` outside
the `if (!doneFut.done)` guard so a pending generation future is always
resolved, even when `doneFut` was already settled. Previously, a second
`_markDone()` would short-circuit and leave the generation unresolved,
causing `mainTask` to hang on `_waitForGeneration()` and starve subsequent
speech handles.

Adds a regression test (`SpeechHandle._markDone - generation resolution
after early done`) that fails on the pre-fix code (timeout) and passes on
the fix.

Ports livekit/agents#5678.

https://claude.ai/code/session_01NQyfoU5f21EryyRbE98AE5
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 8, 2026

🦋 Changeset detected

Latest commit: fe0d991

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 31 packages
Name Type
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-assemblyai Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-cerebras Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-fishaudio Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-hume Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-liveavatar Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-minimax Patch
@livekit/agents-plugin-mistral Patch
@livekit/agents-plugin-mistralai Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-phonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-runway Patch
@livekit/agents-plugin-sarvam Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugins-test Patch
@livekit/agents-plugin-trugen Patch
@livekit/agents-plugin-xai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

@toubatbrian toubatbrian closed this May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants