Describe the bug
Description
When invoking session.say() consistently produces no audio output, despite:
or session.generateReply() • The agent joining the room successfully
• onEnter firing correctly (confirmed via logs)
• A speech handle being created (confirmed via framework debug log)
• Server-side dashboard showing track published by agent for ~21 seconds
• No errors surfaced in logs
• No unhandled promise rejections
• Unchanged across 3 TTS providers and 2 routing paths (Inference + direct plugin)
The agent state transitions directly from initializing from Agent.onEnter(), the agent
to listening, skipping speaking entirely.
Environment
• @livekit/agents: 1.2.6 (deployed as 1.2.6 in LiveKit Cloud)
• @livekit/agents-plugin-inworld: 1.2.6 (also tested with inference.TTS cartesia/openai)
• @livekit/agents-plugin-deepgram: 1.2.6
• @livekit/agents-plugin-livekit: 1.2.4 (turn detector)
• @livekit/agents-plugin-silero: (VAD)
• Node.js 22 (node:22-slim container)
• TypeScript 5.9, Vite 7.3.2 for build, pnpm 10
• Deployed to LiveKit Cloud "Build" plan, us-east region
• Agent ID: CA_gPrJTQK9WYGF
inference.LLM({ model: 'openai/gpt-4.1-mini' }) (placeholder)
Reproduction steps
- AgentSession configured with:
• STT: deepgram.STT({ model: 'nova-3', language: 'en' })
• LLM: • TTS: inworld.TTS({ voice: 'Serena', model: 'inworld-tts-1.5-max' }) (direct plugin)
• VAD: Silero
• turnDetection: • preemptiveGeneration: true
• noiseCancellation: audioEnhancement({ model: 'quailVfL' })
- Agent class extends voice.Agent livekit.turnDetector.MultilingualModel()
with onEnter() calling this.session.generateReply({
instructions: 'greet the user' })
- Entry function: ctx.connect() → ctx.waitForParticipant() → new
voice.AgentSession(...) → session.start(...)
- Dispatch via agents-playground.livekit.io with correct agent name
Observed behavior
Agent logs (most recent test with generateReply):
[Mindi] Connecting to room...
[Mindi] Connected to room
[Mindi] Waiting for participant...
[Mindi] Participant joined: identity=identity-Le19
[Mindi] Starting session...
[Mindi] onEnter — triggering generateReply with greeting hint: "..."
{"level":30,"speech_id":"speech_bd82a69a-6b3","msg":"Creating speech handle"}
[Mindi] Session started — Agent.onEnter should have fired
(indefinite silence — no further events)
initializing → listening (never enters speaking)
Playground client (Chrome, Safari, DuckDuckGo all tested):
• Agent STATE CHANGED event: • "Waiting for agent audio track..." persists indefinitely
• Agent Identity field stays on loading spinner (or briefly populates then reverts)
Server-side LiveKit Cloud dashboard:
Event sequence for a representative session (RM_rjYGoRizLkjZ):
12:08:30.374173 Participant joining: agent-AJ_tQWZZuJbYpjs
12:08:30.473045 Participant active: agent-AJ_tQWZZuJbYpjs
12:08:30.689505 Track PUBLISHED: agent publishes audio track
...track remains published for 21 seconds with no apparent frames...
12:08:51.645851 Track UNPUBLISHED: agent unpublishes track
So the agent DOES publish a track to the room server-side — but apparently with no audio frames —
before eventually unpublishing.
What we've ruled out via systematic testing
Configuration tested in 10 iterations (v1 → v10):
Variable Tested values Result
TTS provider inference.TTS(openai/tts-1) Errors loudly: APIError:
LiveKit TTS returned error:
undefined at
inference/tts.js:391
TTS provider inference.TTS(cartesia/soni
Silent hang, no error
c-3)
TTS provider inference.TTS(inworld/inwor
Silent hang, no error
ld-tts-1.5-max)
TTS provider inworld.TTS (direct plugin, own
Silent hang, no error
API key)
Speech trigger session.say(text) Silent hang, no Creating
speech handle log
Speech trigger session.generateReply({
Silent hang, but DOES log
instructions })
Creating speech handle with
speech_id
Agent config Custom llmNode override
(Claude routing with prefills)
Silent hang
Agent config Stripped llmNode default)
(framework
Silent hang
Entry ordering session.start() before
Silent hang
ctx.connect()
Entry ordering ctx.connect() →
Silent hang
waitForParticipant() →
session.start() (canonical)
Browser Chrome (Blink) "Waiting for agent audio track..."
Browser Safari, DuckDuckGo (WebKit) "Waiting for agent audio track..."
Dispatch Correct agent name
Silent hang
mymindi-spike
Dispatch Auto-dispatch (blank agent
name)
Silent hang
Not ruled out yet
• Framework bug specific to this deployment setup (1.2.6 on Cloud us-east)
• Interaction between noiseCancellation: audioEnhancement({ model: 'quailVfL' }) and
TTS output (worth testing with this removed)
• Interaction with custom VAD/turnDetection config during greeting (before user speech)
• E2EE accidentally enabled at project level (not visible in dashboard but possible)
Expected behavior
Agent should enter speaking state, produce audio frames via TTS, publish them to the
already-published track, audio plays in the client, then moves to agent.onEnter() promise resolves and agent
listening.
Impact
This blocks deployment to production for a voice agent product targeting elderly users. We've spent
~14 hours debugging with detailed logs, dashboard traces, and code iteration before filing. Happy to
provide additional information, deploy test builds, or give a LiveKit engineer direct access to the agent
for debugging.
Request
We'd appreciate help identifying:
-
-
- Why session.say() produces no Creating speech handle log while generateReply() does
Why the published track contains no audio frames
Whether there's a known issue with the specific combination of plugins we're using
Thank you for the framework. Would love to get this working.
Relevant log output
georgechen@Georges-MacBook-Pro mymindi-spike % cd ~/mymindi-spike
lk agent logs --log-type=deploy > /tmp/v10-test.txt 2>&1 &
sleep 12
kill %1 2>/dev/null
tail -60 /tmp/v10-test.txt
[1] 50125
[1] + exit 1 lk agent logs --log-type=deploy > /tmp/v10-test.txt 2>&1
Using project [mymindi]
Using agent [CA_gPrJTQK9WYGF]
agent-starter-node@1.0.0 start /app
node dist/main.js start
◇ injected env (0) from .env.local // tip: ⌘ enable debugging { debug: true }
{"level":40,"time":1776852561500,"pid":47,"hostname":"deployment-p-5nn3sr48n4a-ca-gprjtqk9wygf-5ff48fcd8d-lnvwf","version":"1.2.6","msg":"custom loadThreshold is not supported when deploying to Cloud, using defaults"}
{"level":30,"time":1776852565325,"pid":47,"hostname":"deployment-p-5nn3sr48n4a-ca-gprjtqk9wygf-5ff48fcd8d-lnvwf","version":"1.2.6","msg":"starting worker"}
{"level":30,"time":1776852565370,"pid":47,"hostname":"deployment-p-5nn3sr48n4a-ca-gprjtqk9wygf-5ff48fcd8d-lnvwf","msg":"Server is listening on port 8081"}
{"level":30,"time":1776852565401,"pid":47,"hostname":"deployment-p-5nn3sr48n4a-ca-gprjtqk9wygf-5ff48fcd8d-lnvwf","version":"1.2.6","id":"CAW_S3R96aKnQU4j","server_info":{"edition":"Cloud","version":"1.10.1","protocol":17,"region":"US East B","nodeId":"NC_OASHBURN1B_CzEb25gLovbd","debugInfo":"","agentProtocol":0},"msg":"registered worker"}
◇ injected env (0) from .env.local // tip: ⌘ override existing { override: true }
◇ injected env (0) from .env.local // tip: ⌘ suppress logs { quiet: true }
◇ injected env (0) from .env.local // tip: ⌘ enable debugging { debug: true }
{"level":30,"time":1776852894215,"pid":47,"hostname":"deployment-p-5nn3sr48n4a-ca-gprjtqk9wygf-5ff48fcd8d-lnvwf","version":"1.2.6","jobId":"AJ_QGXBAeb6xChQ","resuming":false,"agentName":"mymindi-spike","msg":"received job request"}
[Mindi] Connecting to room...
[Mindi] Connected to room
[Mindi] Waiting for participant...
[Mindi] Participant joined: identity=identity-Le19
[SessionContext] No metadata found — using TEST_USER fallback
[Mindi] Session context — user=George, firstCall=false, tz=America/Los_Angeles, callId=test-call-1776852566652
[Mindi] Starting session...
{"level":30,"time":1776852894877,"pid":101,"hostname":"deployment-p-5nn3sr48n4a-ca-gprjtqk9wygf-5ff48fcd8d-lnvwf","participantValue":"identity-Le19","trackPublications":[],"lengthOfTrackPublications":0,"msg":"participantValue.trackPublications"}
[Mindi] onEnter — triggering generateReply with greeting hint: "Hey George! What are you up to this morning?"
{"level":30,"time":1776852894937,"pid":101,"hostname":"deployment-p-5nn3sr48n4a-ca-gprjtqk9wygf-5ff48fcd8d-lnvwf","speech_id":"speech_bd82a69a-6b3","msg":"Creating speech handle"}
[Mindi] Session started — Agent.onEnter should have fired
◇ injected env (0) from .env.local // tip: ⌘ custom filepath { path: '/custom/path/.env' }
scanner error: context canceled
Describe your environment
• OS: macOS Tahoe 26.4.1
Project: p_5nn3sr48n4a (mymindi)
• Region: us-east
• Sample session IDs with the silent-hang behavior:
(room playground-s1J2-xNC0)
• Various others — Agent Observability is enabled, so traces are available
• Agent deployment: • Deploy version tonight: v20260422070552 • RM_rjYGoRizLkjZ CA_gPrJTQK9WYGF
(cartesia variant), plus subsequent v9/v10 deploys
Minimal reproducible example
No response
Additional information
No response
Describe the bug
Description
When invoking session.say() consistently produces no audio output, despite:
or session.generateReply() • The agent joining the room successfully
• onEnter firing correctly (confirmed via logs)
• A speech handle being created (confirmed via framework debug log)
• Server-side dashboard showing track published by agent for ~21 seconds
• No errors surfaced in logs
• No unhandled promise rejections
• Unchanged across 3 TTS providers and 2 routing paths (Inference + direct plugin)
The agent state transitions directly from initializing from Agent.onEnter(), the agent
to listening, skipping speaking entirely.
Environment
• @livekit/agents: 1.2.6 (deployed as 1.2.6 in LiveKit Cloud)
• @livekit/agents-plugin-inworld: 1.2.6 (also tested with inference.TTS cartesia/openai)
• @livekit/agents-plugin-deepgram: 1.2.6
• @livekit/agents-plugin-livekit: 1.2.4 (turn detector)
• @livekit/agents-plugin-silero: (VAD)
• Node.js 22 (node:22-slim container)
• TypeScript 5.9, Vite 7.3.2 for build, pnpm 10
• Deployed to LiveKit Cloud "Build" plan, us-east region
• Agent ID: CA_gPrJTQK9WYGF
inference.LLM({ model: 'openai/gpt-4.1-mini' }) (placeholder)
Reproduction steps
• STT: deepgram.STT({ model: 'nova-3', language: 'en' })
• LLM: • TTS: inworld.TTS({ voice: 'Serena', model: 'inworld-tts-1.5-max' }) (direct plugin)
• VAD: Silero
• turnDetection: • preemptiveGeneration: true
• noiseCancellation: audioEnhancement({ model: 'quailVfL' })
with onEnter() calling this.session.generateReply({
instructions: 'greet the user' })
voice.AgentSession(...) → session.start(...)
Observed behavior
Agent logs (most recent test with generateReply):
[Mindi] Connecting to room...
[Mindi] Connected to room
[Mindi] Waiting for participant...
[Mindi] Participant joined: identity=identity-Le19
[Mindi] Starting session...
[Mindi] onEnter — triggering generateReply with greeting hint: "..."
{"level":30,"speech_id":"speech_bd82a69a-6b3","msg":"Creating speech handle"}
[Mindi] Session started — Agent.onEnter should have fired
(indefinite silence — no further events)
initializing → listening (never enters speaking)
Playground client (Chrome, Safari, DuckDuckGo all tested):
• Agent STATE CHANGED event: • "Waiting for agent audio track..." persists indefinitely
• Agent Identity field stays on loading spinner (or briefly populates then reverts)
Server-side LiveKit Cloud dashboard:
Event sequence for a representative session (RM_rjYGoRizLkjZ):
12:08:30.374173 Participant joining: agent-AJ_tQWZZuJbYpjs
12:08:30.473045 Participant active: agent-AJ_tQWZZuJbYpjs
12:08:30.689505 Track PUBLISHED: agent publishes audio track
...track remains published for 21 seconds with no apparent frames...
12:08:51.645851 Track UNPUBLISHED: agent unpublishes track
So the agent DOES publish a track to the room server-side — but apparently with no audio frames —
before eventually unpublishing.
What we've ruled out via systematic testing
Configuration tested in 10 iterations (v1 → v10):
Variable Tested values Result
TTS provider inference.TTS(openai/tts-1) Errors loudly: APIError:
LiveKit TTS returned error:
undefined at
inference/tts.js:391
TTS provider inference.TTS(cartesia/soni
Silent hang, no error
c-3)
TTS provider inference.TTS(inworld/inwor
Silent hang, no error
ld-tts-1.5-max)
TTS provider inworld.TTS (direct plugin, own
Silent hang, no error
API key)
Speech trigger session.say(text) Silent hang, no Creating
speech handle log
Speech trigger session.generateReply({
Silent hang, but DOES log
instructions })
Creating speech handle with
speech_id
Agent config Custom llmNode override
(Claude routing with prefills)
Silent hang
Agent config Stripped llmNode default)
(framework
Silent hang
Entry ordering session.start() before
Silent hang
ctx.connect()
Entry ordering ctx.connect() →
Silent hang
waitForParticipant() →
session.start() (canonical)
Browser Chrome (Blink) "Waiting for agent audio track..."
Browser Safari, DuckDuckGo (WebKit) "Waiting for agent audio track..."
Dispatch Correct agent name
Silent hang
mymindi-spike
Dispatch Auto-dispatch (blank agent
name)
Silent hang
Not ruled out yet
• Framework bug specific to this deployment setup (1.2.6 on Cloud us-east)
• Interaction between noiseCancellation: audioEnhancement({ model: 'quailVfL' }) and
TTS output (worth testing with this removed)
• Interaction with custom VAD/turnDetection config during greeting (before user speech)
• E2EE accidentally enabled at project level (not visible in dashboard but possible)
Expected behavior
Agent should enter speaking state, produce audio frames via TTS, publish them to the
already-published track, audio plays in the client, then moves to agent.onEnter() promise resolves and agent
listening.
Impact
This blocks deployment to production for a voice agent product targeting elderly users. We've spent
~14 hours debugging with detailed logs, dashboard traces, and code iteration before filing. Happy to
provide additional information, deploy test builds, or give a LiveKit engineer direct access to the agent
for debugging.
Request
We'd appreciate help identifying:
Why the published track contains no audio frames
Whether there's a known issue with the specific combination of plugins we're using
Thank you for the framework. Would love to get this working.
Relevant log output
georgechen@Georges-MacBook-Pro mymindi-spike % cd ~/mymindi-spike
lk agent logs --log-type=deploy > /tmp/v10-test.txt 2>&1 &
sleep 12
kill %1 2>/dev/null
tail -60 /tmp/v10-test.txt
[1] 50125
[1] + exit 1 lk agent logs --log-type=deploy > /tmp/v10-test.txt 2>&1
Using project [mymindi]
Using agent [CA_gPrJTQK9WYGF]
◇ injected env (0) from .env.local // tip: ⌘ enable debugging { debug: true }
{"level":40,"time":1776852561500,"pid":47,"hostname":"deployment-p-5nn3sr48n4a-ca-gprjtqk9wygf-5ff48fcd8d-lnvwf","version":"1.2.6","msg":"custom loadThreshold is not supported when deploying to Cloud, using defaults"}
{"level":30,"time":1776852565325,"pid":47,"hostname":"deployment-p-5nn3sr48n4a-ca-gprjtqk9wygf-5ff48fcd8d-lnvwf","version":"1.2.6","msg":"starting worker"}
{"level":30,"time":1776852565370,"pid":47,"hostname":"deployment-p-5nn3sr48n4a-ca-gprjtqk9wygf-5ff48fcd8d-lnvwf","msg":"Server is listening on port 8081"}
{"level":30,"time":1776852565401,"pid":47,"hostname":"deployment-p-5nn3sr48n4a-ca-gprjtqk9wygf-5ff48fcd8d-lnvwf","version":"1.2.6","id":"CAW_S3R96aKnQU4j","server_info":{"edition":"Cloud","version":"1.10.1","protocol":17,"region":"US East B","nodeId":"NC_OASHBURN1B_CzEb25gLovbd","debugInfo":"","agentProtocol":0},"msg":"registered worker"}
◇ injected env (0) from .env.local // tip: ⌘ override existing { override: true }
◇ injected env (0) from .env.local // tip: ⌘ suppress logs { quiet: true }
◇ injected env (0) from .env.local // tip: ⌘ enable debugging { debug: true }
{"level":30,"time":1776852894215,"pid":47,"hostname":"deployment-p-5nn3sr48n4a-ca-gprjtqk9wygf-5ff48fcd8d-lnvwf","version":"1.2.6","jobId":"AJ_QGXBAeb6xChQ","resuming":false,"agentName":"mymindi-spike","msg":"received job request"}
[Mindi] Connecting to room...
[Mindi] Connected to room
[Mindi] Waiting for participant...
[Mindi] Participant joined: identity=identity-Le19
[SessionContext] No metadata found — using TEST_USER fallback
[Mindi] Session context — user=George, firstCall=false, tz=America/Los_Angeles, callId=test-call-1776852566652
[Mindi] Starting session...
{"level":30,"time":1776852894877,"pid":101,"hostname":"deployment-p-5nn3sr48n4a-ca-gprjtqk9wygf-5ff48fcd8d-lnvwf","participantValue":"identity-Le19","trackPublications":[],"lengthOfTrackPublications":0,"msg":"participantValue.trackPublications"}
[Mindi] onEnter — triggering generateReply with greeting hint: "Hey George! What are you up to this morning?"
{"level":30,"time":1776852894937,"pid":101,"hostname":"deployment-p-5nn3sr48n4a-ca-gprjtqk9wygf-5ff48fcd8d-lnvwf","speech_id":"speech_bd82a69a-6b3","msg":"Creating speech handle"}
[Mindi] Session started — Agent.onEnter should have fired
◇ injected env (0) from .env.local // tip: ⌘ custom filepath { path: '/custom/path/.env' }
scanner error: context canceled
Describe your environment
• OS: macOS Tahoe 26.4.1
Project: p_5nn3sr48n4a (mymindi)
• Region: us-east
• Sample session IDs with the silent-hang behavior:
(room playground-s1J2-xNC0)
• Various others — Agent Observability is enabled, so traces are available
• Agent deployment: • Deploy version tonight: v20260422070552 • RM_rjYGoRizLkjZ CA_gPrJTQK9WYGF
(cartesia variant), plus subsequent v9/v10 deploys
Minimal reproducible example
No response
Additional information
No response