feat(amd): feature parity with Python AMD implementation#1394
feat(amd): feature parity with Python AMD implementation#1394chenghao-mou merged 14 commits intoclaude/quirky-galileo-B4wihfrom
Conversation
Ports python livekit/agents#5584 (AMD improvement) into agents-js. - Expose `humanSpeechThresholdMs`, `humanSilenceThresholdMs`, `machineSilenceThresholdMs`, and `prompt` as `AMDOptions` fields. - Defer to the LLM (instead of forcing HUMAN) when a transcript is already available after a short greeting. - Add `postpone_termination` LLM tool (capped at 3 extensions × 10s) alongside `save_prediction`; fall back to JSON-content parsing when the LLM does not emit tool calls. - Add `participantIdentity` and `suppressCompatibilityWarning` options. - Warn once when the resolved LLM is not in `EVALUATED_LLM_MODELS`. Skipped (architectural divergence — see PR description): dedicated AMD STT pipeline, track-subscription wait, and the `start()` / `start_timers()` lifecycle split.
- Gate `save_prediction` and `postpone_termination` tool side effects on the current `detectGeneration`. Stale in-flight classifications now no-op instead of mutating timers, budget, or capturing a verdict that belongs to a superseded transcript window. - Normalize `save_prediction`'s `label` argument through `parseCategory` before storing, so an off-enum value from a misbehaving LLM (or our manual JSON path that bypasses Zod) is treated as UNCERTAIN rather than producing an `AMDResult` with an invalid category string. - Fix `warnIfNotEvaluated` substring check to also handle date-suffixed model names (e.g. `openai/gpt-4.1-mini-2025-04-14`).
Without this, a postpone_termination tool call resolved after aclose() would still see isStale() === false (settled was never flipped) and install a fresh silenceTimer that survives cleanup, eventually firing scheduleLLMClassification + tryEmitResult and potentially triggering session.interrupt on a closed AMD.
Without a lower bound and NaN guard, a misbehaving LLM passing a negative or non-numeric `seconds` argument would compute a clampedMs of NaN or a negative number, which setTimeout treats as 0 and fires immediately. The manual tool-execution path here bypasses the Zod schema, so this defense lives in execute().
Port of livekit/agents#5637. When a final STT transcript arrives inside the short-speech HUMAN_SILENCE_THRESHOLD window, cancel the pre-baked HUMAN/short_greeting silence timer and replace it with a long_speech timer anchored at speechEndedAt + MACHINE_SILENCE_THRESHOLD_MS so the LLM verdict gets the final word. https://claude.ai/code/session_017SqU9Zxmo439ZtcdwzKZp9
- added SIP code in the example; - added support for separate STT; - added support for participant wait; - added default models
🦋 Changeset detectedLatest commit: 9a24e2c The changes in this PR will be included in the next version bump. This PR includes changesets to release 29 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
| // Start running AMD before creating the SIP participant to avoid losing | ||
| // any of the early audio. Same ordering as the python example. | ||
| if (phoneNumber && outboundTrunkId && participantIdentity) { | ||
| if ( | ||
| !process.env.LIVEKIT_URL || | ||
| !process.env.LIVEKIT_API_KEY || | ||
| !process.env.LIVEKIT_API_SECRET | ||
| ) { | ||
| throw new Error('outbound dial requires LIVEKIT_URL/API_KEY/API_SECRET'); | ||
| } | ||
| const roomName = ctx.room.name; | ||
| if (!roomName) { | ||
| throw new Error('ctx.room has no name; cannot place outbound call'); | ||
| } | ||
|
|
||
| if (result.category === voice.AMDCategory.HUMAN) { | ||
| logger.info({ amd: result }, 'human answered the call, proceeding with normal conversation'); | ||
| return; | ||
| } | ||
| const sip = new SipClient( | ||
| process.env.LIVEKIT_URL, | ||
| process.env.LIVEKIT_API_KEY, | ||
| process.env.LIVEKIT_API_SECRET, | ||
| ); | ||
|
|
||
| if (result.category === voice.AMDCategory.MACHINE_IVR) { | ||
| logger.info({ amd: result }, 'ivr menu detected, starting navigation'); | ||
| return; | ||
| } | ||
| logger.info({ participantIdentity }, 'creating SIP participant'); | ||
| await sip.createSipParticipant(outboundTrunkId, phoneNumber, roomName, { | ||
| participantIdentity, | ||
| waitUntilAnswered: true, | ||
| }); | ||
|
|
||
| if (result.category === voice.AMDCategory.MACHINE_VM) { | ||
| logger.info({ amd: result }, 'voicemail detected, leaving a message'); | ||
| const speechHandle = session.generateReply({ | ||
| instructions: | ||
| "You've reached voicemail. Leave a brief message asking the customer to call back.", | ||
| }); | ||
| await speechHandle.waitForPlayout(); | ||
| session.shutdown({ reason: 'amd:machine-vm' }); | ||
| return; | ||
| } | ||
| const participant = await ctx.waitForParticipant(participantIdentity); | ||
| const subscribedAudioTrackSids: string[] = []; | ||
| for (const pub of participant.trackPublications.values()) { | ||
| if (pub.subscribed && pub.kind === TrackKind.KIND_AUDIO && pub.sid) { | ||
| subscribedAudioTrackSids.push(pub.sid); | ||
| } | ||
| } | ||
| logger.info( | ||
| { | ||
| actualIdentity: participant.identity, | ||
| expectedIdentity: participantIdentity, | ||
| kind: participant.kind, | ||
| audioTracksSubscribed: subscribedAudioTrackSids, | ||
| }, | ||
| 'participant joined', | ||
| ); | ||
| } | ||
|
|
||
| if (result.category === voice.AMDCategory.MACHINE_UNAVAILABLE) { | ||
| logger.info({ amd: result }, 'mailbox unavailable, ending call'); | ||
| session.shutdown({ reason: 'amd:machine-unavailable' }); | ||
| return; | ||
| const result = await detector.execute(); |
There was a problem hiding this comment.
🟡 Example starts AMD detection after SIP participant creation, contradicting the comment and missing early call audio
detector.execute() is called at line 138, AFTER the if block (lines 95-136) that creates the SIP participant and waits for it to join. However, the comment on lines 93-94 explicitly says "Start running AMD before creating the SIP participant to avoid losing any of the early audio. Same ordering as the python example." The AMD constructor on line 88 does not start detection — only execute() registers event handlers and starts the STT pump. For outbound calls, this means the initial call greeting (the exact audio AMD needs to classify) can already be spoken and processed by AudioRecognition before AMD's dedicated STT pump subscribes via subscribeAudioStream(). The correct pattern (matching the Python example) would be to call detector.execute() without await before createSipParticipant, then await the result after.
Prompt for agents
In examples/src/telephony_amd.ts, the AMD detection ordering is wrong for outbound calls. The comment on lines 93-94 says 'Start running AMD before creating the SIP participant to avoid losing any of the early audio' but detector.execute() is called AFTER the SIP participant creation block.
The fix: start detector.execute() as a background promise before the SIP participant creation, then await the result afterwards. Something like:
const amdPromise = detector.execute();
if (phoneNumber && outboundTrunkId && participantIdentity) {
// ... create SIP participant, wait for participant ...
}
const result = await amdPromise;
This matches the Python example pattern where the AMD coroutine is started as a task before the SIP call is placed, ensuring AMD's STT pump and event listeners are active before any call audio arrives.
Was this helpful? React with 👍 or 👎 to provide feedback.
15c346a to
4027e25
Compare
|
|
Tested with a SIP call.