Skip to content

Single PC improvements#986

Open
pblazej wants to merge 8 commits intomainfrom
blaze/single-pc-improvements
Open

Single PC improvements#986
pblazej wants to merge 8 commits intomainfrom
blaze/single-pc-improvements

Conversation

@pblazej
Copy link
Copy Markdown
Contributor

@pblazej pblazej commented May 5, 2026

The publisher transport's negotiate() always runs through a 20ms debouncer to coalesce rapid renegotiations during track publishing. At connect time there is nothing to coalesce, and the delay sits directly in the critical path — most visible in single-PC mode where the client must initiate the SDP offer.

Two changes:

  1. Transport.negotiate(force:) — when force=true, cancel any pending debounced action and call createAndSendOffer() directly.
  2. Move the eager-negotiate call out of configureTransports and into fullConnectSequence after signalClient.resumeQueues(), calling with force=true. This is required because offers are not queueable (Livekit_SignalRequest.canBeQueued() returns false for .offer), so a forced send while the request queue is still suspended would silently drop the offer.

pblazej and others added 8 commits May 5, 2026 13:22
The publisher transport's `negotiate()` always runs through a 20ms debouncer
to coalesce rapid renegotiations during track publishing. At connect time
there is nothing to coalesce, and the delay sits directly in the critical
path — most visible in single-PC mode where the client must initiate the
SDP offer.

Two changes:

1. `Transport.negotiate(force:)` — when `force=true`, cancel any pending
   debounced action and call `createAndSendOffer()` directly.

2. Move the eager-negotiate call out of `configureTransports` and into
   `fullConnectSequence` *after* `signalClient.resumeQueues()`, calling
   with `force=true`. This is required because offers are not queueable
   (`Livekit_SignalRequest.canBeQueued()` returns false for `.offer`),
   so a forced send while the request queue is still suspended would
   silently drop the offer.

Cloud benchmark (BM-CONN-003 SinglePC) p50 wall time: ~406ms → ~338ms,
closing the gap with BM-CONN-001 DualPC.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related observability changes for the BM-CONN connect-time spans:

1. `dc_open` is now recorded in a single place — `Room.dataChannelDidOpen`
   — for both single-PC (publisher pair) and dual-PC (subscriber pair).
   `DataChannelPair` notifies via the new `DataChannelDelegate.dataChannelDidOpen`
   when the pair transitions to open. `TransportMode.connectDataChannelPair(...)`
   selects which pair is canonical for the active mode. The previous inline
   record in `Room+TransportDelegate` is removed; in single-PC mode it never
   fired (subscriber transport is nil), so `D_DC_MS` was missing from
   BM-CONN-003 entirely.

2. `offer_sent` is recorded in the publisher's `onOffer` block, mirroring
   the existing `answer_sent` split for server-initiated offers. The benchmark
   reads either label as `sdpDispatched` so `D_ICE_DTLS_MS` is recorded for
   both client-initiated (single PC, publisher-primary) and server-initiated
   (subscriber-primary) flows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Awaits the canonical `DataChannelPair.openCompleter` after `pc_connected`
so the `dc_open` tracing split is reliably captured before `connect()`
returns. Best-effort (`try?`) so a data-channel-open timeout doesn't
fail connect — the primary transport is already up at this point.

In healthy connects the data channels share the SCTP transport and open
within ~1-2ms of DTLS, so this adds negligible blocking. Recovers
`D_DC_MS` as a reliable benchmark measurement (was 1/25 samples, now 25/25).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Now that the connect sequence awaits the canonical pair's openCompleter
and records dc_open inline, the delegate hop is dead weight. Drops the
protocol method, the Room implementation, and the handleDidOpen helper —
the open-state notification falls back to direct openCompleter.resume().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Recording dc_open without blocking connect on it produces a measurement
inconsistent with the spec — `T_ROOM_CONNECTED` would arrive before
`T_DC_OPEN`. Blocking on it (60a5098) added ~70-80ms to cloud connect
latency for an internal benchmark concern.

Drops:
- The async dc_open recording in `fullConnectSequence`
- `TransportMode.connectDataChannelPair(...)` (no remaining callers)
- `D_DC_MS` from the benchmark output (kept commented for future use)

Net effect: connect returns at `pc_connected` as before, no `D_DC_MS`
column. Other splits (`D_WS_MS`, `D_SIGNAL_MS`, `D_TRANSPORT_MS`,
`D_ICE_DTLS_MS`) are unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Provides an opt-in await for the data channels created during the
connect handshake. `connect()` returns at primary peer connection DTLS
completion; data channels share the same SCTP transport and open
shortly after but the open event fires asynchronously. Awaiting this
method observes full handshake completion and records `dc_open` on the
connect span.

Mirrors the `RemoteParticipant.waitUntilActive(timeout:)` shape (default
timeout, throwing, returns Self for chaining).

The `BM-CONN` benchmarks call this between `connect()` and
`stopMeasurement()` so wall-clock matches the spec's `T_ROOM_CONNECTED`
and `D_DC_MS` is captured reliably.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Matches the existing pattern of checking cancellation after each await
in fullConnectSequence. Without it, a cancellation between resumeQueues
and publisherShouldNegotiate could let an offer go out before the wait
on primaryTransportConnectedCompleter fails.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
}

if subscriberDataChannel.isOpen {
connectSpan?.record("dc_open")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved to 769386a to make it consistent with the spec, I'm not sure if we should drop this data point if that's not really awaitable from Swift.

I may re-evaluate the spec (and other SDKs).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now I'm more for dropping it...

@pblazej
Copy link
Copy Markdown
Contributor Author

pblazej commented May 5, 2026

The gap seems to be closed (within the noise level), note that the interpretation changes due to DC_OPEN tweaks above ⬆️ so that the total wall time also increased slightly.

https://github.com/livekit/client-sdk-swift/actions/runs/25377916039

@pblazej pblazej requested review from hiroshihorie and lukasIO May 5, 2026 13:57
@pblazej pblazej marked this pull request as ready for review May 5, 2026 13:59
Copy link
Copy Markdown
Contributor

@lukasIO lukasIO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants