Skip to content

Add LiveKit-based audio/video calling with Element Call interop#68

Open
rexbron wants to merge 24 commits into
subpop:mainfrom
rexbron:Relay-livekit-sdk
Open

Add LiveKit-based audio/video calling with Element Call interop#68
rexbron wants to merge 24 commits into
subpop:mainfrom
rexbron:Relay-livekit-sdk

Conversation

@rexbron
Copy link
Copy Markdown
Contributor

@rexbron rexbron commented Apr 25, 2026

Summary

Adds native macOS audio/video calling to Relay, backed by the LiveKit Swift
SDK and interoperable with Element Call
and Element-X. Both encrypted (per-participant E2EE via SFrame/HKDF) and
unencrypted Matrix rooms are supported.

The feature lands as a new RelayKit/Call/ subsystem plus a redesigned
CallView in the app target. Overview of the components:

Component Role
LiveKitCredentialService Implements MSC4143 credential exchange — discovers the SFU URL via /_matrix/client/unstable/org.matrix.msc4143/rtc/transports (with .well-known fallback), requests an OpenID token, and exchanges it for a LiveKit JWT (/get_token v2 with legacy /sfu/get fallback).
CallEncryptionService MatrixRTC m.call.member (MSC3401 / MSC4143) state-event signaling, key generation, and HKDF-backed LKRTCFrameCryptorKeyProvider plumbing for SFrame interop with Element Call's web key provider.
CallWidgetBridge Standalone matrix-rust-sdk WidgetDriver bridge — speaks the Widget API JSON protocol directly from Swift so the SDK handles Olm-encrypted to-device delivery of io.element.call.encryption_keys. Replaces an earlier raw-REST path that Element-X rejected.
CallViewModel LiveKit Room lifecycle, frame-cryptor key plumbing, key-redistribution on remote join, expires-ts heartbeat, bounded leave cleanup.
LiveKitLogBridge Forwards LiveKit SDK logs through os.Logger with [RTC] prefix, all at .private privacy.
CallView macOS-native call UI: 1:1 FaceTime layout, group tiling with aspect-fitted tiles, soft speaking-glow, name pills, self PiP, auto-close on clean disconnect.

Compatibility

Tested call interop verified against:

  • Element-X iOS (encrypted + unencrypted rooms)
  • Element Call web (encrypted, per-participant keys)

The encryption-key exchange is per-participant SFrame using HKDF-SHA256 — the
LiveKit Swift SDK's default LKRTCFrameCryptorKeyProvider ships with PBKDF2,
which produces different AES-GCM keys from identical IKM and silently breaks
peer decryption. We swap in HKDF via the 7-arg ObjC initializer exposed in
webrtc-xcframework 144.7559+; if the runtime lookup fails we log and fall
back to PBKDF2 with a clear "interop will fail" warning.

MatrixRTC details

  • m.call.member events use the MSC4143 per-device state-key format
    _<userId>_<deviceId>_m.call, populated with application: "m.call",
    m.call.intent: "video", and created_ts so each heartbeat is a distinct
    event (Synapse can dedupe identical state-event content).
  • 30-minute heartbeat against a 4-hour expires window — matches what
    matrix-js-sdk's MatrixRTCSession does for Element Call.
  • Leave is bounded: heartbeat cancelled first, then we await
    removeCallMemberEvent() with a 2-second cap so peers see us go
    immediately rather than waiting for expires_ts to fire.
  • Power-level mutation at join time has been removed; rooms are provisioned
    with the right call event PLs at creation via MatrixService.callPowerLevels.

UI

  • 1 remote → FaceTime-style primary video + self PiP.
  • 2+ remotes → polished tiled grid; tiles size to the source video aspect
    ratio (live, not just on resize — TrackDelegate.didUpdateDimensions
    bumps videoTrackRevision), surrounded by the call's gradient backdrop.
    Self always stays in the bottom-right PiP.
  • Soft drop shadow on tiles, accent-color glow on the speaker, no hard
    borders. Solid black-tinted name pill (ultraThinMaterial vanishes over
    bright video).
  • Camera-off / mute state flips instantly via the didUpdateIsMuted /
    didUnsubscribeTrack / didUnpublishTrack callbacks.
  • Clean disconnect closes the window; failures show a dismissable error.

Logging / privacy

[RTC] prefix on every call-related log makes filtering trivial in Console.
A pre-merge audit (695a027) tightened the privacy qualifiers — full widget
JSON payloads (which include AES keys for io.element.call.encryption_keys)
are never logged any more; SDK error strings, room IDs, and call.member
content are .private. Matrix user/device IDs intentionally stay .public
since they're observable on the homeserver and useful for diagnostics.

Test plan

  • 1:1 video call against Element-X iOS in an encrypted room.
  • 1:1 video call against Element-X iOS in an unencrypted room.
  • Group call (3+) against Element Call web in an encrypted room — confirm
    tiles render, names show, speaker glow tracks, mute/unmute flips
    placeholder immediately.
  • Group call against Element-X iOS in an encrypted room.
  • Camera and mic toggles during the call.
  • Leave from caller side — confirm peers see departure within seconds
    (not minutes-of-expires-ts).
  • Leave from remote side — confirm our UI removes the participant and the
    window closes when the room empties.
  • Long call (>30 min) — confirm membership heartbeat fires (look for
    [RTC]Heartbeat refreshed call.member state event in Console with
    private logs enabled).
  • Spot check: filter Console for [RTC] and confirm no AES key material,
    no JWT, no full event content visible at default privacy.

🤖 Generated with Claude Code

@subpop
Copy link
Copy Markdown
Owner

subpop commented Apr 26, 2026

Thanks @rexbron!

@subpop
Copy link
Copy Markdown
Owner

subpop commented Apr 28, 2026

@rexbron I have local commits that are rebased on top of current main. It's all your existing commits, minus the merge of main. Do you mind if I push them to this branch? It'll address the conflicts here, and should get us positioned to merge this easier.

@rexbron
Copy link
Copy Markdown
Contributor Author

rexbron commented Apr 28, 2026

Sounds great!

@subpop subpop force-pushed the Relay-livekit-sdk branch from eca5947 to 3311cef Compare April 29, 2026 01:56
@subpop
Copy link
Copy Markdown
Owner

subpop commented Apr 29, 2026

I also added a commit that restores the Secrets.xcconfig build reference. I believe you should be able to build local development builds by creating that file with empty values.

rexbron and others added 23 commits April 30, 2026 08:20
Adds end-to-end support for LiveKit-backed calls in Matrix rooms:

- `CallViewModelProtocol` + `CallState` / `CallParticipant` models in RelayInterface
- `CallViewModel` in RelayKit wraps `LiveKit.Room` and bridges `RoomDelegate`
  callbacks onto the main actor via `Task { @mainactor in … }`
- `makeVideoView(for:)` creates a `LiveKit.VideoView` (NSView subclass) so that
  no LiveKit types escape into the app or protocol layers
- `CallView` in Relay/Views shows participant tiles with speaking indicators,
  a bottom control bar (mic, camera, end call), and an NSViewRepresentable
  video bridge — imports only RelayInterface and SwiftUI
- `PreviewCallViewModel` simulates a connected call for SwiftUI previews
- `makeCallViewModel(roomId:)` added to `MatrixServiceProtocol`,
  `MatrixService`, `PreviewMatrixService`, and the placeholder
- Phone button added to the room toolbar in `MainView`; pressing it opens
  `CallView` in a sheet
- Camera + microphone sandbox entitlements added to `Relay.entitlements`
- `NSCameraUsageDescription` + `NSMicrophoneUsageDescription` added to
  `Info.plist`
- LiveKit SPM package (`client-sdk-swift`, ≥ 2.0.0) added to
  `project.pbxproj` and linked into the RelayKit framework target

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add missing `import AppKit` so NSView resolves in the RelayKit framework
- Qualify all RoomDelegate method Room parameters as `LiveKit.Room` to resolve
  ambiguity with the MatrixRustSDK Room type
- Fix videoTracks access: LiveKit v2 exposes an array not a dictionary, so
  `.first` yields the publication directly without a `.value` key-path
- Qualify `LiveKit.Room()` constructor for the same ambiguity reason

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the full credential exchange flow so calls connect automatically:

LiveKitCredentialService (new, RelayKit/Call/):
- Step 1: Discover SFU URL via GET /_matrix/client/unstable/
  org.matrix.msc4143/rtc/transports; falls back to reading
  org.matrix.msc4143.rtc_foci from .well-known/matrix/client
- Step 2: Obtain an OpenID token via POST /_matrix/client/v3/user/
  {userId}/openid/request_token using the session's Matrix access token
- Step 3: Exchange with the SFU's POST /get_token (MSC4143 v2) or the
  legacy POST /sfu/get; both return { url, jwt } for LiveKit

MatrixServiceProtocol / MatrixService:
- New callCredentials(for roomId:) method builds LiveKitCredentialService
  from the active session (homeserver, accessToken, userID, deviceID)
  and returns the (livekitURL, token) tuple

MainView:
- startCall() now auto-fetches credentials in a background Task and calls
  viewModel.connect(url:token:) immediately; falls back to the manual
  join form if the homeserver doesn't support MatrixRTC
- isPreparingCall flag passed to CallView to drive the correct UI state

CallView:
- New isPreparingCredentials parameter: when true, .idle state shows
  "Contacting call server…" spinner with Cancel instead of the join form
- The join form remains as a fallback for unsupported homeservers or
  direct LiveKit connections

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
LiveKitCredentialService returns (url:token:) but the protocol requires
(livekitURL:token:); re-label on the way out of MatrixService.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The local participant's camera was publishing successfully but the UI
showed a grey placeholder because:

1. VideoViewRepresentable.makeNSView called makeVideoView(for:) once —
   if the track wasn't ready at that instant it returned the placeholder
   and updateNSView never replaced it (it was a no-op).
2. makeVideoView created a brand-new VideoView on every call so it was
   never stable across SwiftUI re-renders.

CallViewModel fixes:
- Cache VideoView instances per participant in a dictionary; return the
  same instance on subsequent calls and update its .track in place
- Add videoTrackRevision counter, bumped after connect, toggleCamera,
  and syncParticipants — drives SwiftUI re-renders

VideoViewRepresentable fixes:
- makeNSView now creates a stable container NSView (dark grey background)
- updateNSView asks the view model for the current video view and
  attaches it as a constrained subview of the container
- If the video view is already attached, it's left in place

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The VideoViewRepresentable was never receiving updateNSView calls because
videoTrackRevision was not exposed through the CallViewModelProtocol and
was never read during SwiftUI body evaluation. Added the property to the
protocol, passed it into the representable, and added delegate callbacks
for local/remote track publish events. Includes diagnostic logging to
trace track availability through the connect pipeline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the custom VideoViewRepresentable (which caused garbled Metal
rendering) with LiveKit's built-in SwiftUIVideoView.  Cache video views
per participant to prevent SwiftUI from tearing down the Metal surface
on re-renders.

Key changes:
- Switch from NSView-based VideoViewRepresentable to SwiftUIVideoView
  wrapped in AnyView, returned via makeVideoView(for:)
- Add video view cache keyed by participant ID + VideoTrack identity
- Add isSubscribed / isMuted guards matching LiveKit components-swift
- Configure RoomOptions with preferredCodec (.vp8), adaptiveStream,
  and dynacast; use ConnectOptions with enableMicrophone
- Remove .clipShape() from video tiles (interferes with Metal)
- Move aspectRatio to outer tile container
- Clean up diagnostic logging and delayed retry tasks
- Add network.server entitlement for WebRTC

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enable AES-128-GCM frame encryption on LiveKit calls using per-participant
keys, and implement MatrixRTC call membership signaling so Element-X and
other MatrixRTC clients can discover and join calls.

- Add CallEncryptionService with key generation (16-byte random),
  dual-transport key distribution (to-device + room state events),
  timeline-based inbound key listener, and MatrixRTC call.member
  state event signaling (MSC3401)
- Configure BaseKeyProvider with per-participant keys and GCM encryption
  on RoomOptions, using ObjC runtime to set raw key bytes
- Send org.matrix.msc3401.call.member state event on connect so
  Element-X sees the call, remove on disconnect
- Redistribute encryption keys to newly joined participants
- Pass Matrix SDK Room and credentials into CallViewModel via
  EncryptionContext for key exchange and timeline listening

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Major changes:

Call UI overhaul:
- Move call to its own window (Window scene + CallManager + CallWindowView)
- FaceTime-style design: remote video fills window, self-view PiP overlay,
  floating translucent control bar with hover-to-reveal
- Fix beachball on disconnect by deferring network cleanup to Task.detached
- Fix recursive constraint crash by deferring dismissWindow via DispatchQueue
- Fix call window not reopening after ending a call

Element-X/Element-web interop:
- Fix call member state event to match MSC4143 format exactly:
  state key _userId_deviceId_m.call, focus_active with focus_selection,
  foci_preferred with livekit_alias, membershipID, m.call.intent
- Pass SFU service URL (from discovery) through credential flow for
  correct livekit_service_url in call member events
- Disable audio RED to match Element-X (audio/opus, not audio/red)
- Auto-configure call power levels (org.matrix.msc3401.call.member → PL 0)

Conditional E2EE:
- Enable LiveKit GCM frame encryption only for encrypted Matrix rooms
- Check room encryption state via roomInfo().encryptionState at call start
- Unencrypted rooms publish with no LiveKit-level encryption, matching
  Element-X behavior

Timeline improvements:
- Display call member state events as "User started a call" with phone icon
- Hide encryption key exchange events from timeline
- Add .callEvent kind to TimelineMessage

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix recursive constraint crash: defer all CallManager observable state
  mutations and window open/dismiss calls to the next run-loop iteration
  so they never fire during an active AppKit layout pass
- Make call window draggable, resizable, and responsive to Window menu
  commands (Fill, Center) using .hiddenTitleBar with transparent styling
- Suppress call window on launch (.defaultLaunchBehavior(.suppressed))
- Report call failures to user via errorReporter instead of swallowing
- Deduplicate consecutive timeline call events from the same sender
- Consolidate dismiss to single onChange path, eliminate double-dismiss
- End Call / Cancel buttons only disconnect; endedOverlay auto-dismisses

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Element Call rejected every encrypted frame from Relay even when key
exchange and IKM fingerprints matched on both sides. Root causes:

1. PBKDF2 vs HKDF key derivation. The LiveKit Swift SDK's BaseKeyProvider
   forwards to an LKRTCFrameCryptorKeyProvider initializer that hard-codes
   PBKDF2, but livekit-client JS / Element Call derives the AES-GCM key
   with HKDF-SHA256 from the same raw IKM. Same fingerprint, different
   AES key, every auth tag fails on the peer.

   Fix: bump webrtc-xcframework to 144.7559.03 (which exposes the 7-arg
   ObjC init taking keyDerivationAlgorithm:) and client-sdk-swift to
   2.13.0. Added CallEncryptionService.makeHKDFKeyProvider which uses
   the Objective-C runtime to construct an HKDF-backed
   LKRTCFrameCryptorKeyProvider and swap it into BaseKeyProvider's
   internal rtcKeyProvider ivar — no direct LiveKitWebRTC import needed.

2. LiveKit participant identity didn't match the Matrix identity peers
   used to look up our key. Construct it explicitly as
   "userId:deviceId" and warn if LiveKit hands back a different value.

3. Microphone was auto-publishing at connect time, so the first audio
   frames hit the SFU before peers received our key — their cryptor then
   ratcheted past the window and poisoned the slot. Defer mic/camera
   publish until after sendEncryptionKey completes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Makes it straightforward to filter the encrypted-call flow out of a
noisy Console by grepping for "[RTC]". Adds LiveKitLogBridge so
LiveKit SDK logs flow through OSLog with the same prefix.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ded leave

Element Call doesn't try to mutate m.room.power_levels at join time —
it relies on the room being provisioned correctly. Drop the
enableCallPowerLevels() runtime path; MatrixService.callPowerLevels
still applies the same defaults at room creation.

Add an expires_ts-style heartbeat that re-sends the
org.matrix.msc3401.call.member state event every 30 minutes (against
a 4-hour expires window), matching matrix-js-sdk's MatrixRTCSession.
Each refresh carries a created_ts so Synapse can't dedupe identical
state-event content.

Tighten disconnect(): cancel the heartbeat first so it can't race the
leave, then await removeCallMemberEvent() with a 2-second cap and
await room.disconnect() — peers see us leave immediately instead of
waiting up to 4 hours for expires_ts to fire.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When two or more remotes are present, swap the FaceTime-style "primary
+ PiP" layout for a tiled grid that preserves each remote's source
aspect ratio. Self always stays in the bottom-right PiP.

Tile design:
- Aspect-fitted card sized to the source video, centered in its grid
  cell against the call's dark gradient background — no harsh
  letterbox.
- Soft drop shadow + near-invisible hairline edge for depth; speaking
  swaps the hairline for a soft accent-color glow (no hard border).
- Solid black-tinted name capsule with mic.fill / mic.slash.fill
  badge — ultraThinMaterial blends into bright video frames and the
  text vanishes.
- displayName(for:) strips Matrix identities (@user:server:device →
  user) so the pill shows a friendly localpart when the JWT didn't
  supply a name.

Aspect updates live, not just on resize:
- New videoAspectRatio(for:) on CallViewModelProtocol reads the
  underlying VideoTrack.dimensions.
- The Delegate now also conforms to TrackDelegate and registers
  itself on every video track it sees, so dimension changes (rotation,
  simulcast layer switches, source swaps) bump videoTrackRevision and
  re-evaluate the tile.
- RoomDelegate.didUpdateStreamState bumps too, so the aspect snaps to
  the real value as soon as the first frame arrives.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wire up the missing LiveKit RoomDelegate callbacks so toggling a
remote's camera or mic flips the tile state instantly instead of
waiting for an unrelated sync to fire:

- didUpdateIsMuted: refresh participants so isCameraEnabled /
  isMicrophoneEnabled flip and the tile re-evaluates makeVideoView
  (returns nil for muted tracks → placeholder appears).
- didUnpublishTrack / didUnsubscribeTrack (remote): same path.
- didUnpublishTrack (local): bump videoTrackRevision so the self PiP
  swaps to the off state.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Drop the "Call Ended" + Dismiss-button overlay for normal endings.
The .disconnected case now renders Color.clear with a .task that
fires onDismiss() the moment the branch mounts, so the window closes
immediately. Background cleanup (removeCallMemberEvent, LiveKit
teardown) continues in the existing disconnect() task.

The endedOverlay is kept for .failed only — errors still need a UI
so users can read what went wrong before dismissing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Audit of the [RTC]-prefixed logs added during LiveKit / Element Call
interop work surfaced several places that wrote sensitive data to
the system log at .public privacy.

Critical:
- CallWidgetBridge.recvLoop logged the raw JSON of every widget
  driver message at .public — including outbound and inbound
  send_to_device payloads of type io.element.call.encryption_keys
  whose `keys.key` field carries the raw 16-byte AES key. Replaced
  with a byte-count-only debug log; action + type are still logged
  separately one line later for traceability.

Defensive:
- m.call.member event body, state key, and existing-call-member
  content dropped to .debug and marked .private. Routing data and
  per-call membership UUIDs aren't secrets but don't belong in
  Console output either.
- LiveKitLogBridge now forwards all SDK log content as .private —
  the SDK can surface JWTs, signaling URLs, or handshake details
  at its own discretion.
- All error.localizedDescription interpolations in [RTC] logs now
  carry privacy: .private. SDK error strings can embed request
  URLs, tokens, or response bodies.

Already safe and left alone:
- AES keys are only ever logged as sha256[0..8] fingerprints.
- OpenID and LiveKit JWTs are never logged.
- Matrix user/device IDs intentionally remain .public — they're
  observable on the homeserver, not secret.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Inadvertently dropped in bbf9580 ("Fix LiveKit E2EE interop with
Element Call") during incidental cleanup. The stub is unreferenced
by any build target on either branch, but restoring it keeps this
branch's tree consistent with upstream/main and avoids a cosmetic
deletion in the PR diff.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The per-item TimelineMessageMapper.mapItem path was bypassing
describeStateEvent and falling through to the generic
stateEventDescription, which surfaces "Room settings were updated"
for any custom state event — including org.matrix.msc3401.call.member.

Route .state through describeStateEvent like the bulk and rebuild
paths already do, so MatrixRTC call membership renders as
"X started a call" with .callEvent kind, and the noisy
io.element.call.encryption_keys events are filtered out.

Marks describeStateEvent nonisolated (it's pure) so mapItem can call
it from its nonisolated context.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The pbxproj conflict resolution during rebase dropped the PBXFileReference,
group entry, and all six baseConfigurationReference entries for
Secrets.xcconfig. Without these, DEVELOPMENT_TEAM and GIPHY_API_KEY are
not loaded as build settings and the Generate Secrets build phase cannot
read them from the environment.

Assisted-By: Claude (OpenCode)
The toolbar conflict resolution during rebase kept main's structure but
dropped the call button ToolbarItem and startCallButton helper that the
LiveKit branch added.

Assisted-By: Claude (OpenCode)
fetchWellKnownSFUURL was querying .well-known on the delegated homeserver
URL (e.g. fedora.ems.host) which does not serve .well-known. Matrix
requires .well-known to be fetched from the server name domain (e.g.
fedora.im), which is the part after ":" in the user ID.

Extract the server name from the user ID and use it for the .well-known
lookup so servers with delegation (like fedora.im → fedora.ems.host)
correctly discover the rtc_foci LiveKit SFU URL.

Assisted-By: Claude (OpenCode)
@subpop subpop force-pushed the Relay-livekit-sdk branch from 3311cef to 1408e84 Compare April 30, 2026 12:55
@rexbron rexbron marked this pull request as ready for review May 6, 2026 23:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants