Skip to content

feat(amd): feature parity with Python AMD implementation#1394

Merged
chenghao-mou merged 14 commits intoclaude/quirky-galileo-B4wihfrom
chenghao/feat/amd-sip-and-stt-support
May 6, 2026
Merged

feat(amd): feature parity with Python AMD implementation#1394
chenghao-mou merged 14 commits intoclaude/quirky-galileo-B4wihfrom
chenghao/feat/amd-sip-and-stt-support

Conversation

@chenghao-mou
Copy link
Copy Markdown
Member

  • added SIP code in the example;
  • added support for separate STT;
  • added support for participant wait;
  • added default models
  • pending: adding AMD remote session event: Version Packages protocol#1523 (review)

Tested with a SIP call.

claude and others added 9 commits May 1, 2026 12:06
Ports python livekit/agents#5584 (AMD improvement) into agents-js.

- Expose `humanSpeechThresholdMs`, `humanSilenceThresholdMs`,
  `machineSilenceThresholdMs`, and `prompt` as `AMDOptions` fields.
- Defer to the LLM (instead of forcing HUMAN) when a transcript is
  already available after a short greeting.
- Add `postpone_termination` LLM tool (capped at 3 extensions × 10s)
  alongside `save_prediction`; fall back to JSON-content parsing when
  the LLM does not emit tool calls.
- Add `participantIdentity` and `suppressCompatibilityWarning` options.
- Warn once when the resolved LLM is not in `EVALUATED_LLM_MODELS`.

Skipped (architectural divergence — see PR description): dedicated AMD
STT pipeline, track-subscription wait, and the `start()` /
`start_timers()` lifecycle split.
- Gate `save_prediction` and `postpone_termination` tool side effects on
  the current `detectGeneration`. Stale in-flight classifications now
  no-op instead of mutating timers, budget, or capturing a verdict that
  belongs to a superseded transcript window.
- Normalize `save_prediction`'s `label` argument through `parseCategory`
  before storing, so an off-enum value from a misbehaving LLM (or our
  manual JSON path that bypasses Zod) is treated as UNCERTAIN rather
  than producing an `AMDResult` with an invalid category string.
- Fix `warnIfNotEvaluated` substring check to also handle date-suffixed
  model names (e.g. `openai/gpt-4.1-mini-2025-04-14`).
Without this, a postpone_termination tool call resolved after aclose()
would still see isStale() === false (settled was never flipped) and
install a fresh silenceTimer that survives cleanup, eventually firing
scheduleLLMClassification + tryEmitResult and potentially triggering
session.interrupt on a closed AMD.
Without a lower bound and NaN guard, a misbehaving LLM passing a
negative or non-numeric `seconds` argument would compute a clampedMs
of NaN or a negative number, which setTimeout treats as 0 and fires
immediately. The manual tool-execution path here bypasses the Zod
schema, so this defense lives in execute().
Port of livekit/agents#5637. When a final STT transcript arrives inside
the short-speech HUMAN_SILENCE_THRESHOLD window, cancel the pre-baked
HUMAN/short_greeting silence timer and replace it with a long_speech
timer anchored at speechEndedAt + MACHINE_SILENCE_THRESHOLD_MS so the
LLM verdict gets the final word.

https://claude.ai/code/session_017SqU9Zxmo439ZtcdwzKZp9
- added SIP code in the example;
- added support for separate STT;
- added support for participant wait;
- added default models
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 5, 2026

🦋 Changeset detected

Latest commit: 9a24e2c

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 29 packages
Name Type
@livekit/agents Major
@livekit/agents-plugin-anam Major
@livekit/agents-plugin-assemblyai Major
@livekit/agents-plugin-baseten Major
@livekit/agents-plugin-bey Major
@livekit/agents-plugin-cartesia Major
@livekit/agents-plugin-cerebras Major
@livekit/agents-plugin-deepgram Major
@livekit/agents-plugin-elevenlabs Major
@livekit/agents-plugin-google Major
@livekit/agents-plugin-hedra Major
@livekit/agents-plugin-inworld Major
@livekit/agents-plugin-lemonslice Major
@livekit/agents-plugin-liveavatar Major
@livekit/agents-plugin-livekit Major
@livekit/agents-plugin-minimax Major
@livekit/agents-plugin-mistral Major
@livekit/agents-plugin-mistralai Major
@livekit/agents-plugin-neuphonic Major
@livekit/agents-plugin-openai Major
@livekit/agents-plugin-phonic Major
@livekit/agents-plugin-resemble Major
@livekit/agents-plugin-rime Major
@livekit/agents-plugin-runway Major
@livekit/agents-plugin-sarvam Major
@livekit/agents-plugin-silero Major
@livekit/agents-plugins-test Major
@livekit/agents-plugin-trugen Major
@livekit/agents-plugin-xai Major

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@chenghao-mou chenghao-mou requested a review from a team May 5, 2026 13:44
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 14 additional findings in Devin Review.

Open in Devin Review

Comment on lines +93 to +138
// Start running AMD before creating the SIP participant to avoid losing
// any of the early audio. Same ordering as the python example.
if (phoneNumber && outboundTrunkId && participantIdentity) {
if (
!process.env.LIVEKIT_URL ||
!process.env.LIVEKIT_API_KEY ||
!process.env.LIVEKIT_API_SECRET
) {
throw new Error('outbound dial requires LIVEKIT_URL/API_KEY/API_SECRET');
}
const roomName = ctx.room.name;
if (!roomName) {
throw new Error('ctx.room has no name; cannot place outbound call');
}

if (result.category === voice.AMDCategory.HUMAN) {
logger.info({ amd: result }, 'human answered the call, proceeding with normal conversation');
return;
}
const sip = new SipClient(
process.env.LIVEKIT_URL,
process.env.LIVEKIT_API_KEY,
process.env.LIVEKIT_API_SECRET,
);

if (result.category === voice.AMDCategory.MACHINE_IVR) {
logger.info({ amd: result }, 'ivr menu detected, starting navigation');
return;
}
logger.info({ participantIdentity }, 'creating SIP participant');
await sip.createSipParticipant(outboundTrunkId, phoneNumber, roomName, {
participantIdentity,
waitUntilAnswered: true,
});

if (result.category === voice.AMDCategory.MACHINE_VM) {
logger.info({ amd: result }, 'voicemail detected, leaving a message');
const speechHandle = session.generateReply({
instructions:
"You've reached voicemail. Leave a brief message asking the customer to call back.",
});
await speechHandle.waitForPlayout();
session.shutdown({ reason: 'amd:machine-vm' });
return;
}
const participant = await ctx.waitForParticipant(participantIdentity);
const subscribedAudioTrackSids: string[] = [];
for (const pub of participant.trackPublications.values()) {
if (pub.subscribed && pub.kind === TrackKind.KIND_AUDIO && pub.sid) {
subscribedAudioTrackSids.push(pub.sid);
}
}
logger.info(
{
actualIdentity: participant.identity,
expectedIdentity: participantIdentity,
kind: participant.kind,
audioTracksSubscribed: subscribedAudioTrackSids,
},
'participant joined',
);
}

if (result.category === voice.AMDCategory.MACHINE_UNAVAILABLE) {
logger.info({ amd: result }, 'mailbox unavailable, ending call');
session.shutdown({ reason: 'amd:machine-unavailable' });
return;
const result = await detector.execute();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Example starts AMD detection after SIP participant creation, contradicting the comment and missing early call audio

detector.execute() is called at line 138, AFTER the if block (lines 95-136) that creates the SIP participant and waits for it to join. However, the comment on lines 93-94 explicitly says "Start running AMD before creating the SIP participant to avoid losing any of the early audio. Same ordering as the python example." The AMD constructor on line 88 does not start detection — only execute() registers event handlers and starts the STT pump. For outbound calls, this means the initial call greeting (the exact audio AMD needs to classify) can already be spoken and processed by AudioRecognition before AMD's dedicated STT pump subscribes via subscribeAudioStream(). The correct pattern (matching the Python example) would be to call detector.execute() without await before createSipParticipant, then await the result after.

Prompt for agents
In examples/src/telephony_amd.ts, the AMD detection ordering is wrong for outbound calls. The comment on lines 93-94 says 'Start running AMD before creating the SIP participant to avoid losing any of the early audio' but detector.execute() is called AFTER the SIP participant creation block.

The fix: start detector.execute() as a background promise before the SIP participant creation, then await the result afterwards. Something like:

  const amdPromise = detector.execute();
  if (phoneNumber && outboundTrunkId && participantIdentity) {
    // ... create SIP participant, wait for participant ...
  }
  const result = await amdPromise;

This matches the Python example pattern where the AMD coroutine is started as a task before the SIP call is placed, ensuring AMD's STT pump and event listeners are active before any call audio arrives.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@chenghao-mou chenghao-mou force-pushed the claude/quirky-galileo-B4wih branch from 15c346a to 4027e25 Compare May 6, 2026 15:02
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ toubatbrian
❌ claude
You have signed the CLA already but the status is still pending? Let us recheck it.

@chenghao-mou chenghao-mou merged commit a2c8caa into claude/quirky-galileo-B4wih May 6, 2026
1 check passed
@chenghao-mou chenghao-mou deleted the chenghao/feat/amd-sip-and-stt-support branch May 6, 2026 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants