Skip to content

Node.js 1.2.6: No audio from session.say() or generateReply() in onEnter — speech handle created but no frames produced, no errors, agent state transitions initializing → listening (skipping speaking) #1289

@Lugiox

Description

@Lugiox

Describe the bug

Description
When invoking session.say() consistently produces no audio output, despite:
or session.generateReply() • The agent joining the room successfully
• onEnter firing correctly (confirmed via logs)
• A speech handle being created (confirmed via framework debug log)
• Server-side dashboard showing track published by agent for ~21 seconds
• No errors surfaced in logs
• No unhandled promise rejections
• Unchanged across 3 TTS providers and 2 routing paths (Inference + direct plugin)
The agent state transitions directly from initializing from Agent.onEnter(), the agent
to listening, skipping speaking entirely.
Environment
• @livekit/agents: 1.2.6 (deployed as 1.2.6 in LiveKit Cloud)
• @livekit/agents-plugin-inworld: 1.2.6 (also tested with inference.TTS cartesia/openai)
• @livekit/agents-plugin-deepgram: 1.2.6
• @livekit/agents-plugin-livekit: 1.2.4 (turn detector)
• @livekit/agents-plugin-silero: (VAD)
• Node.js 22 (node:22-slim container)
• TypeScript 5.9, Vite 7.3.2 for build, pnpm 10
• Deployed to LiveKit Cloud "Build" plan, us-east region
• Agent ID: CA_gPrJTQK9WYGF
inference.LLM({ model: 'openai/gpt-4.1-mini' }) (placeholder)
Reproduction steps

  1. AgentSession configured with:
    • STT: deepgram.STT({ model: 'nova-3', language: 'en' })
    • LLM: • TTS: inworld.TTS({ voice: 'Serena', model: 'inworld-tts-1.5-max' }) (direct plugin)
    • VAD: Silero
    • turnDetection: • preemptiveGeneration: true
    • noiseCancellation: audioEnhancement({ model: 'quailVfL' })
  2. Agent class extends voice.Agent livekit.turnDetector.MultilingualModel()
    with onEnter() calling this.session.generateReply({
    instructions: 'greet the user' })
  3. Entry function: ctx.connect() → ctx.waitForParticipant() → new
    voice.AgentSession(...) → session.start(...)
  4. Dispatch via agents-playground.livekit.io with correct agent name
    Observed behavior
    Agent logs (most recent test with generateReply):
    [Mindi] Connecting to room...
    [Mindi] Connected to room
    [Mindi] Waiting for participant...
    [Mindi] Participant joined: identity=identity-Le19
    [Mindi] Starting session...
    [Mindi] onEnter — triggering generateReply with greeting hint: "..."
    {"level":30,"speech_id":"speech_bd82a69a-6b3","msg":"Creating speech handle"}
    [Mindi] Session started — Agent.onEnter should have fired
    (indefinite silence — no further events)
    initializing → listening (never enters speaking)
    Playground client (Chrome, Safari, DuckDuckGo all tested):
    • Agent STATE CHANGED event: • "Waiting for agent audio track..." persists indefinitely
    • Agent Identity field stays on loading spinner (or briefly populates then reverts)
    Server-side LiveKit Cloud dashboard:
    Event sequence for a representative session (RM_rjYGoRizLkjZ):
    12:08:30.374173 Participant joining: agent-AJ_tQWZZuJbYpjs
    12:08:30.473045 Participant active: agent-AJ_tQWZZuJbYpjs
    12:08:30.689505 Track PUBLISHED: agent publishes audio track
    ...track remains published for 21 seconds with no apparent frames...
    12:08:51.645851 Track UNPUBLISHED: agent unpublishes track
    So the agent DOES publish a track to the room server-side — but apparently with no audio frames —
    before eventually unpublishing.
    What we've ruled out via systematic testing
    Configuration tested in 10 iterations (v1 → v10):
    Variable Tested values Result
    TTS provider inference.TTS(openai/tts-1) Errors loudly: APIError:
    LiveKit TTS returned error:
    undefined at
    inference/tts.js:391
    TTS provider inference.TTS(cartesia/soni
    Silent hang, no error
    c-3)
    TTS provider inference.TTS(inworld/inwor
    Silent hang, no error
    ld-tts-1.5-max)
    TTS provider inworld.TTS (direct plugin, own
    Silent hang, no error
    API key)
    Speech trigger session.say(text) Silent hang, no Creating
    speech handle log
    Speech trigger session.generateReply({
    Silent hang, but DOES log
    instructions })
    Creating speech handle with
    speech_id
    Agent config Custom llmNode override
    (Claude routing with prefills)
    Silent hang
    Agent config Stripped llmNode default)
    (framework
    Silent hang
    Entry ordering session.start() before
    Silent hang
    ctx.connect()
    Entry ordering ctx.connect() →
    Silent hang
    waitForParticipant() →
    session.start() (canonical)
    Browser Chrome (Blink) "Waiting for agent audio track..."
    Browser Safari, DuckDuckGo (WebKit) "Waiting for agent audio track..."
    Dispatch Correct agent name
    Silent hang
    mymindi-spike
    Dispatch Auto-dispatch (blank agent
    name)
    Silent hang
    Not ruled out yet
    • Framework bug specific to this deployment setup (1.2.6 on Cloud us-east)
    • Interaction between noiseCancellation: audioEnhancement({ model: 'quailVfL' }) and
    TTS output (worth testing with this removed)
    • Interaction with custom VAD/turnDetection config during greeting (before user speech)
    • E2EE accidentally enabled at project level (not visible in dashboard but possible)
    Expected behavior
    Agent should enter speaking state, produce audio frames via TTS, publish them to the
    already-published track, audio plays in the client, then moves to agent.onEnter() promise resolves and agent
    listening.
    Impact
    This blocks deployment to production for a voice agent product targeting elderly users. We've spent
    ~14 hours debugging with detailed logs, dashboard traces, and code iteration before filing. Happy to
    provide additional information, deploy test builds, or give a LiveKit engineer direct access to the agent
    for debugging.
    Request
    We'd appreciate help identifying:
      1. Why session.say() produces no Creating speech handle log while generateReply() does
        Why the published track contains no audio frames
        Whether there's a known issue with the specific combination of plugins we're using
        Thank you for the framework. Would love to get this working.

Relevant log output

georgechen@Georges-MacBook-Pro mymindi-spike % cd ~/mymindi-spike
lk agent logs --log-type=deploy > /tmp/v10-test.txt 2>&1 &
sleep 12
kill %1 2>/dev/null
tail -60 /tmp/v10-test.txt
[1] 50125
[1] + exit 1 lk agent logs --log-type=deploy > /tmp/v10-test.txt 2>&1
Using project [mymindi]
Using agent [CA_gPrJTQK9WYGF]

agent-starter-node@1.0.0 start /app
node dist/main.js start

◇ injected env (0) from .env.local // tip: ⌘ enable debugging { debug: true }
{"level":40,"time":1776852561500,"pid":47,"hostname":"deployment-p-5nn3sr48n4a-ca-gprjtqk9wygf-5ff48fcd8d-lnvwf","version":"1.2.6","msg":"custom loadThreshold is not supported when deploying to Cloud, using defaults"}
{"level":30,"time":1776852565325,"pid":47,"hostname":"deployment-p-5nn3sr48n4a-ca-gprjtqk9wygf-5ff48fcd8d-lnvwf","version":"1.2.6","msg":"starting worker"}
{"level":30,"time":1776852565370,"pid":47,"hostname":"deployment-p-5nn3sr48n4a-ca-gprjtqk9wygf-5ff48fcd8d-lnvwf","msg":"Server is listening on port 8081"}
{"level":30,"time":1776852565401,"pid":47,"hostname":"deployment-p-5nn3sr48n4a-ca-gprjtqk9wygf-5ff48fcd8d-lnvwf","version":"1.2.6","id":"CAW_S3R96aKnQU4j","server_info":{"edition":"Cloud","version":"1.10.1","protocol":17,"region":"US East B","nodeId":"NC_OASHBURN1B_CzEb25gLovbd","debugInfo":"","agentProtocol":0},"msg":"registered worker"}
◇ injected env (0) from .env.local // tip: ⌘ override existing { override: true }
◇ injected env (0) from .env.local // tip: ⌘ suppress logs { quiet: true }
◇ injected env (0) from .env.local // tip: ⌘ enable debugging { debug: true }
{"level":30,"time":1776852894215,"pid":47,"hostname":"deployment-p-5nn3sr48n4a-ca-gprjtqk9wygf-5ff48fcd8d-lnvwf","version":"1.2.6","jobId":"AJ_QGXBAeb6xChQ","resuming":false,"agentName":"mymindi-spike","msg":"received job request"}
[Mindi] Connecting to room...
[Mindi] Connected to room
[Mindi] Waiting for participant...
[Mindi] Participant joined: identity=identity-Le19
[SessionContext] No metadata found — using TEST_USER fallback
[Mindi] Session context — user=George, firstCall=false, tz=America/Los_Angeles, callId=test-call-1776852566652
[Mindi] Starting session...
{"level":30,"time":1776852894877,"pid":101,"hostname":"deployment-p-5nn3sr48n4a-ca-gprjtqk9wygf-5ff48fcd8d-lnvwf","participantValue":"identity-Le19","trackPublications":[],"lengthOfTrackPublications":0,"msg":"participantValue.trackPublications"}
[Mindi] onEnter — triggering generateReply with greeting hint: "Hey George! What are you up to this morning?"
{"level":30,"time":1776852894937,"pid":101,"hostname":"deployment-p-5nn3sr48n4a-ca-gprjtqk9wygf-5ff48fcd8d-lnvwf","speech_id":"speech_bd82a69a-6b3","msg":"Creating speech handle"}
[Mindi] Session started — Agent.onEnter should have fired
◇ injected env (0) from .env.local // tip: ⌘ custom filepath { path: '/custom/path/.env' }
scanner error: context canceled

Describe your environment

• OS: macOS Tahoe 26.4.1
Project: p_5nn3sr48n4a (mymindi)
• Region: us-east
• Sample session IDs with the silent-hang behavior:
(room playground-s1J2-xNC0)
• Various others — Agent Observability is enabled, so traces are available
• Agent deployment: • Deploy version tonight: v20260422070552 • RM_rjYGoRizLkjZ CA_gPrJTQK9WYGF
(cartesia variant), plus subsequent v9/v10 deploys

Minimal reproducible example

No response

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions