Stream Rime TTS response chunks#1405
Conversation
🦋 Changeset detectedLatest commit: 785a869 The changes in this PR will be included in the next version bump. This PR includes changesets to release 29 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
adrian-cowham
left a comment
There was a problem hiding this comment.
PR Review Summary
Reviewed by: code-reviewer, silent-failure-hunter, pr-test-analyzer, code-simplifier
Critical Issues
None.
Important Issues
None.
Suggestions
[pr-test-analyzer] Cross-chunk frame accumulation not exercised (severity 7/10) — plugins/rime/src/tts.test.ts
The test sends two chunks of exactly bytesPerFrame (3200 bytes each), so each independently yields one frame. In production, HTTP chunk boundaries are arbitrary and will rarely align. A test with misaligned chunks (e.g., pcmChunk(1600), pcmChunk(1600), pcmChunk(3200)) would exercise byte accumulation across reads — the real behavioral change this PR introduces.
[pr-test-analyzer] audioByteStream.flush() trailing-frame path not exercised (severity 6/10) — plugins/rime/src/tts.test.ts
With aligned chunks the internal buffer is empty at end-of-stream, so flush() returns a zero-sample frame that gets skipped. Sending pcmChunk(3200 + 1600) followed by bodyController.close() would force the flush path to emit a trailing partial frame — a common scenario when Rime's server sends a final unaligned chunk.
[pr-test-analyzer] Reader release on mid-stream error not tested (severity 5/10) — plugins/rime/src/tts.test.ts
A test calling bodyController.error(new Error('network failure')) after enqueuing one chunk would verify that the finally cleanup (reader lock release + queue close) fires correctly on network errors mid-stream.
[code-simplifier] pcmChunk helper fill loop is unnecessary (minor) — plugins/rime/src/tts.test.ts
The even-byte fill produces non-silent PCM but no assertion inspects sample content. return new Uint8Array(byteLength) (silence) exercises the same code paths.
Strengths
- Correct streaming conversion — the
lastFramedeferred-emit pattern preservesfinal: false/final: true/done: truesequencing identically to the pre-PR behavior - Solid resource cleanup —
reader.releaseLock()+queue.close()infinallyensures cleanup on both success and error paths; the intentional lack of acatchblock lets errors propagate to the base class retry/emit machinery - Proper null-body guard —
!response.bodyearly-throw is consistent with baseten, inworld, and minimax plugins - Well-designed regression test — uses a real
ReadableStreamwith manual controller (not over-mocked), validates audio arrives before body closes viawithTimeout, and runs unconditionally in CI without external credentials - Consistent with codebase patterns — the streaming loop mirrors the existing baseten TTS plugin almost exactly
flush()zero-sample guard is load-bearing —AudioByteStream.flush()does not short-circuit when the buffer is empty, so thesamplesPerChannel === 0filter is necessary
Looks good overall. The suggestions above are optional hardening for test coverage — none are blockers.
🤖 Reviewed with Claude Code
Summary
response.bodyinstead of waiting for the fullresponse.arrayBuffer()final: trueContext
The Rime JS TTS plugin was buffering the whole HTTP response before writing any audio into the LiveKit audio queue. Rime can flush PCM as synthesis progresses, but
await response.arrayBuffer()means the agent sees no audio until the provider response has completely finished. For longer utterances, app-observed TTS TTFB therefore grows with total synthesis duration instead of reflecting the provider's first available audio chunk.This brings the JS plugin behavior in line with the intended streaming path: consume the response body reader incrementally, pass each received PCM chunk through
AudioByteStream, and enqueue complete 100msAudioFrames as soon as they are available.Implementation notes
response.bodyis checked before reading so a missing body fails explicitly.ReadableStream.getReader()is released infinally, and the queue is closed in the same cleanup path.AudioByteStream.write()already acceptsArrayBufferViewand honorsbyteOffset/byteLength, so the streamedUint8Arraychunks can be passed directly without copying into a slicedArrayBuffer.AudioByteStream.flush()is still called after EOF to drain a valid partial-frame remainder; empty flush frames are skipped.Validation
pnpm test -- plugins/rime/src/tts.test.tspnpm --filter @livekit/agents-plugin-rime lintpnpm format:checkpnpm turbo run build --filter=@livekit/agents-plugin-rimegit diff --check