Add LiveKit-based audio/video calling with Element Call interop#68
Open
rexbron wants to merge 24 commits into
Open
Add LiveKit-based audio/video calling with Element Call interop#68rexbron wants to merge 24 commits into
rexbron wants to merge 24 commits into
Conversation
Owner
|
Thanks @rexbron! |
Owner
|
@rexbron I have local commits that are rebased on top of current |
Contributor
Author
|
Sounds great! |
eca5947 to
3311cef
Compare
Owner
|
I also added a commit that restores the Secrets.xcconfig build reference. I believe you should be able to build local development builds by creating that file with empty values. |
Adds end-to-end support for LiveKit-backed calls in Matrix rooms:
- `CallViewModelProtocol` + `CallState` / `CallParticipant` models in RelayInterface
- `CallViewModel` in RelayKit wraps `LiveKit.Room` and bridges `RoomDelegate`
callbacks onto the main actor via `Task { @mainactor in … }`
- `makeVideoView(for:)` creates a `LiveKit.VideoView` (NSView subclass) so that
no LiveKit types escape into the app or protocol layers
- `CallView` in Relay/Views shows participant tiles with speaking indicators,
a bottom control bar (mic, camera, end call), and an NSViewRepresentable
video bridge — imports only RelayInterface and SwiftUI
- `PreviewCallViewModel` simulates a connected call for SwiftUI previews
- `makeCallViewModel(roomId:)` added to `MatrixServiceProtocol`,
`MatrixService`, `PreviewMatrixService`, and the placeholder
- Phone button added to the room toolbar in `MainView`; pressing it opens
`CallView` in a sheet
- Camera + microphone sandbox entitlements added to `Relay.entitlements`
- `NSCameraUsageDescription` + `NSMicrophoneUsageDescription` added to
`Info.plist`
- LiveKit SPM package (`client-sdk-swift`, ≥ 2.0.0) added to
`project.pbxproj` and linked into the RelayKit framework target
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add missing `import AppKit` so NSView resolves in the RelayKit framework - Qualify all RoomDelegate method Room parameters as `LiveKit.Room` to resolve ambiguity with the MatrixRustSDK Room type - Fix videoTracks access: LiveKit v2 exposes an array not a dictionary, so `.first` yields the publication directly without a `.value` key-path - Qualify `LiveKit.Room()` constructor for the same ambiguity reason Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the full credential exchange flow so calls connect automatically:
LiveKitCredentialService (new, RelayKit/Call/):
- Step 1: Discover SFU URL via GET /_matrix/client/unstable/
org.matrix.msc4143/rtc/transports; falls back to reading
org.matrix.msc4143.rtc_foci from .well-known/matrix/client
- Step 2: Obtain an OpenID token via POST /_matrix/client/v3/user/
{userId}/openid/request_token using the session's Matrix access token
- Step 3: Exchange with the SFU's POST /get_token (MSC4143 v2) or the
legacy POST /sfu/get; both return { url, jwt } for LiveKit
MatrixServiceProtocol / MatrixService:
- New callCredentials(for roomId:) method builds LiveKitCredentialService
from the active session (homeserver, accessToken, userID, deviceID)
and returns the (livekitURL, token) tuple
MainView:
- startCall() now auto-fetches credentials in a background Task and calls
viewModel.connect(url:token:) immediately; falls back to the manual
join form if the homeserver doesn't support MatrixRTC
- isPreparingCall flag passed to CallView to drive the correct UI state
CallView:
- New isPreparingCredentials parameter: when true, .idle state shows
"Contacting call server…" spinner with Cancel instead of the join form
- The join form remains as a fallback for unsupported homeservers or
direct LiveKit connections
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
LiveKitCredentialService returns (url:token:) but the protocol requires (livekitURL:token:); re-label on the way out of MatrixService. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The local participant's camera was publishing successfully but the UI showed a grey placeholder because: 1. VideoViewRepresentable.makeNSView called makeVideoView(for:) once — if the track wasn't ready at that instant it returned the placeholder and updateNSView never replaced it (it was a no-op). 2. makeVideoView created a brand-new VideoView on every call so it was never stable across SwiftUI re-renders. CallViewModel fixes: - Cache VideoView instances per participant in a dictionary; return the same instance on subsequent calls and update its .track in place - Add videoTrackRevision counter, bumped after connect, toggleCamera, and syncParticipants — drives SwiftUI re-renders VideoViewRepresentable fixes: - makeNSView now creates a stable container NSView (dark grey background) - updateNSView asks the view model for the current video view and attaches it as a constrained subview of the container - If the video view is already attached, it's left in place Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The VideoViewRepresentable was never receiving updateNSView calls because videoTrackRevision was not exposed through the CallViewModelProtocol and was never read during SwiftUI body evaluation. Added the property to the protocol, passed it into the representable, and added delegate callbacks for local/remote track publish events. Includes diagnostic logging to trace track availability through the connect pipeline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the custom VideoViewRepresentable (which caused garbled Metal rendering) with LiveKit's built-in SwiftUIVideoView. Cache video views per participant to prevent SwiftUI from tearing down the Metal surface on re-renders. Key changes: - Switch from NSView-based VideoViewRepresentable to SwiftUIVideoView wrapped in AnyView, returned via makeVideoView(for:) - Add video view cache keyed by participant ID + VideoTrack identity - Add isSubscribed / isMuted guards matching LiveKit components-swift - Configure RoomOptions with preferredCodec (.vp8), adaptiveStream, and dynacast; use ConnectOptions with enableMicrophone - Remove .clipShape() from video tiles (interferes with Metal) - Move aspectRatio to outer tile container - Clean up diagnostic logging and delayed retry tasks - Add network.server entitlement for WebRTC Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enable AES-128-GCM frame encryption on LiveKit calls using per-participant keys, and implement MatrixRTC call membership signaling so Element-X and other MatrixRTC clients can discover and join calls. - Add CallEncryptionService with key generation (16-byte random), dual-transport key distribution (to-device + room state events), timeline-based inbound key listener, and MatrixRTC call.member state event signaling (MSC3401) - Configure BaseKeyProvider with per-participant keys and GCM encryption on RoomOptions, using ObjC runtime to set raw key bytes - Send org.matrix.msc3401.call.member state event on connect so Element-X sees the call, remove on disconnect - Redistribute encryption keys to newly joined participants - Pass Matrix SDK Room and credentials into CallViewModel via EncryptionContext for key exchange and timeline listening Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Major changes: Call UI overhaul: - Move call to its own window (Window scene + CallManager + CallWindowView) - FaceTime-style design: remote video fills window, self-view PiP overlay, floating translucent control bar with hover-to-reveal - Fix beachball on disconnect by deferring network cleanup to Task.detached - Fix recursive constraint crash by deferring dismissWindow via DispatchQueue - Fix call window not reopening after ending a call Element-X/Element-web interop: - Fix call member state event to match MSC4143 format exactly: state key _userId_deviceId_m.call, focus_active with focus_selection, foci_preferred with livekit_alias, membershipID, m.call.intent - Pass SFU service URL (from discovery) through credential flow for correct livekit_service_url in call member events - Disable audio RED to match Element-X (audio/opus, not audio/red) - Auto-configure call power levels (org.matrix.msc3401.call.member → PL 0) Conditional E2EE: - Enable LiveKit GCM frame encryption only for encrypted Matrix rooms - Check room encryption state via roomInfo().encryptionState at call start - Unencrypted rooms publish with no LiveKit-level encryption, matching Element-X behavior Timeline improvements: - Display call member state events as "User started a call" with phone icon - Hide encryption key exchange events from timeline - Add .callEvent kind to TimelineMessage Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix recursive constraint crash: defer all CallManager observable state mutations and window open/dismiss calls to the next run-loop iteration so they never fire during an active AppKit layout pass - Make call window draggable, resizable, and responsive to Window menu commands (Fill, Center) using .hiddenTitleBar with transparent styling - Suppress call window on launch (.defaultLaunchBehavior(.suppressed)) - Report call failures to user via errorReporter instead of swallowing - Deduplicate consecutive timeline call events from the same sender - Consolidate dismiss to single onChange path, eliminate double-dismiss - End Call / Cancel buttons only disconnect; endedOverlay auto-dismisses Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Element Call rejected every encrypted frame from Relay even when key exchange and IKM fingerprints matched on both sides. Root causes: 1. PBKDF2 vs HKDF key derivation. The LiveKit Swift SDK's BaseKeyProvider forwards to an LKRTCFrameCryptorKeyProvider initializer that hard-codes PBKDF2, but livekit-client JS / Element Call derives the AES-GCM key with HKDF-SHA256 from the same raw IKM. Same fingerprint, different AES key, every auth tag fails on the peer. Fix: bump webrtc-xcframework to 144.7559.03 (which exposes the 7-arg ObjC init taking keyDerivationAlgorithm:) and client-sdk-swift to 2.13.0. Added CallEncryptionService.makeHKDFKeyProvider which uses the Objective-C runtime to construct an HKDF-backed LKRTCFrameCryptorKeyProvider and swap it into BaseKeyProvider's internal rtcKeyProvider ivar — no direct LiveKitWebRTC import needed. 2. LiveKit participant identity didn't match the Matrix identity peers used to look up our key. Construct it explicitly as "userId:deviceId" and warn if LiveKit hands back a different value. 3. Microphone was auto-publishing at connect time, so the first audio frames hit the SFU before peers received our key — their cryptor then ratcheted past the window and poisoned the slot. Defer mic/camera publish until after sendEncryptionKey completes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Makes it straightforward to filter the encrypted-call flow out of a noisy Console by grepping for "[RTC]". Adds LiveKitLogBridge so LiveKit SDK logs flow through OSLog with the same prefix. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ded leave Element Call doesn't try to mutate m.room.power_levels at join time — it relies on the room being provisioned correctly. Drop the enableCallPowerLevels() runtime path; MatrixService.callPowerLevels still applies the same defaults at room creation. Add an expires_ts-style heartbeat that re-sends the org.matrix.msc3401.call.member state event every 30 minutes (against a 4-hour expires window), matching matrix-js-sdk's MatrixRTCSession. Each refresh carries a created_ts so Synapse can't dedupe identical state-event content. Tighten disconnect(): cancel the heartbeat first so it can't race the leave, then await removeCallMemberEvent() with a 2-second cap and await room.disconnect() — peers see us leave immediately instead of waiting up to 4 hours for expires_ts to fire. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When two or more remotes are present, swap the FaceTime-style "primary + PiP" layout for a tiled grid that preserves each remote's source aspect ratio. Self always stays in the bottom-right PiP. Tile design: - Aspect-fitted card sized to the source video, centered in its grid cell against the call's dark gradient background — no harsh letterbox. - Soft drop shadow + near-invisible hairline edge for depth; speaking swaps the hairline for a soft accent-color glow (no hard border). - Solid black-tinted name capsule with mic.fill / mic.slash.fill badge — ultraThinMaterial blends into bright video frames and the text vanishes. - displayName(for:) strips Matrix identities (@user:server:device → user) so the pill shows a friendly localpart when the JWT didn't supply a name. Aspect updates live, not just on resize: - New videoAspectRatio(for:) on CallViewModelProtocol reads the underlying VideoTrack.dimensions. - The Delegate now also conforms to TrackDelegate and registers itself on every video track it sees, so dimension changes (rotation, simulcast layer switches, source swaps) bump videoTrackRevision and re-evaluate the tile. - RoomDelegate.didUpdateStreamState bumps too, so the aspect snaps to the real value as soon as the first frame arrives. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wire up the missing LiveKit RoomDelegate callbacks so toggling a remote's camera or mic flips the tile state instantly instead of waiting for an unrelated sync to fire: - didUpdateIsMuted: refresh participants so isCameraEnabled / isMicrophoneEnabled flip and the tile re-evaluates makeVideoView (returns nil for muted tracks → placeholder appears). - didUnpublishTrack / didUnsubscribeTrack (remote): same path. - didUnpublishTrack (local): bump videoTrackRevision so the self PiP swaps to the off state. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Drop the "Call Ended" + Dismiss-button overlay for normal endings. The .disconnected case now renders Color.clear with a .task that fires onDismiss() the moment the branch mounts, so the window closes immediately. Background cleanup (removeCallMemberEvent, LiveKit teardown) continues in the existing disconnect() task. The endedOverlay is kept for .failed only — errors still need a UI so users can read what went wrong before dismissing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Audit of the [RTC]-prefixed logs added during LiveKit / Element Call interop work surfaced several places that wrote sensitive data to the system log at .public privacy. Critical: - CallWidgetBridge.recvLoop logged the raw JSON of every widget driver message at .public — including outbound and inbound send_to_device payloads of type io.element.call.encryption_keys whose `keys.key` field carries the raw 16-byte AES key. Replaced with a byte-count-only debug log; action + type are still logged separately one line later for traceability. Defensive: - m.call.member event body, state key, and existing-call-member content dropped to .debug and marked .private. Routing data and per-call membership UUIDs aren't secrets but don't belong in Console output either. - LiveKitLogBridge now forwards all SDK log content as .private — the SDK can surface JWTs, signaling URLs, or handshake details at its own discretion. - All error.localizedDescription interpolations in [RTC] logs now carry privacy: .private. SDK error strings can embed request URLs, tokens, or response bodies. Already safe and left alone: - AES keys are only ever logged as sha256[0..8] fingerprints. - OpenID and LiveKit JWTs are never logged. - Matrix user/device IDs intentionally remain .public — they're observable on the homeserver, not secret. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Inadvertently dropped in bbf9580 ("Fix LiveKit E2EE interop with Element Call") during incidental cleanup. The stub is unreferenced by any build target on either branch, but restoring it keeps this branch's tree consistent with upstream/main and avoids a cosmetic deletion in the PR diff. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The per-item TimelineMessageMapper.mapItem path was bypassing describeStateEvent and falling through to the generic stateEventDescription, which surfaces "Room settings were updated" for any custom state event — including org.matrix.msc3401.call.member. Route .state through describeStateEvent like the bulk and rebuild paths already do, so MatrixRTC call membership renders as "X started a call" with .callEvent kind, and the noisy io.element.call.encryption_keys events are filtered out. Marks describeStateEvent nonisolated (it's pure) so mapItem can call it from its nonisolated context. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The pbxproj conflict resolution during rebase dropped the PBXFileReference, group entry, and all six baseConfigurationReference entries for Secrets.xcconfig. Without these, DEVELOPMENT_TEAM and GIPHY_API_KEY are not loaded as build settings and the Generate Secrets build phase cannot read them from the environment. Assisted-By: Claude (OpenCode)
The toolbar conflict resolution during rebase kept main's structure but dropped the call button ToolbarItem and startCallButton helper that the LiveKit branch added. Assisted-By: Claude (OpenCode)
fetchWellKnownSFUURL was querying .well-known on the delegated homeserver URL (e.g. fedora.ems.host) which does not serve .well-known. Matrix requires .well-known to be fetched from the server name domain (e.g. fedora.im), which is the part after ":" in the user ID. Extract the server name from the user ID and use it for the .well-known lookup so servers with delegation (like fedora.im → fedora.ems.host) correctly discover the rtc_foci LiveKit SFU URL. Assisted-By: Claude (OpenCode)
3311cef to
1408e84
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds native macOS audio/video calling to Relay, backed by the LiveKit Swift
SDK and interoperable with Element Call
and Element-X. Both encrypted (per-participant E2EE via SFrame/HKDF) and
unencrypted Matrix rooms are supported.
The feature lands as a new
RelayKit/Call/subsystem plus a redesignedCallViewin the app target. Overview of the components:LiveKitCredentialService/_matrix/client/unstable/org.matrix.msc4143/rtc/transports(with.well-knownfallback), requests an OpenID token, and exchanges it for a LiveKit JWT (/get_tokenv2 with legacy/sfu/getfallback).CallEncryptionServicem.call.member(MSC3401 / MSC4143) state-event signaling, key generation, and HKDF-backedLKRTCFrameCryptorKeyProviderplumbing for SFrame interop with Element Call's web key provider.CallWidgetBridgeWidgetDriverbridge — speaks the Widget API JSON protocol directly from Swift so the SDK handles Olm-encrypted to-device delivery ofio.element.call.encryption_keys. Replaces an earlier raw-REST path that Element-X rejected.CallViewModelRoomlifecycle, frame-cryptor key plumbing, key-redistribution on remote join, expires-ts heartbeat, bounded leave cleanup.LiveKitLogBridgeos.Loggerwith[RTC]prefix, all at.privateprivacy.CallViewCompatibility
Tested call interop verified against:
The encryption-key exchange is per-participant SFrame using HKDF-SHA256 — the
LiveKit Swift SDK's default
LKRTCFrameCryptorKeyProviderships with PBKDF2,which produces different AES-GCM keys from identical IKM and silently breaks
peer decryption. We swap in HKDF via the 7-arg ObjC initializer exposed in
webrtc-xcframework144.7559+; if the runtime lookup fails we log and fallback to PBKDF2 with a clear "interop will fail" warning.
MatrixRTC details
m.call.memberevents use the MSC4143 per-device state-key format_<userId>_<deviceId>_m.call, populated withapplication: "m.call",m.call.intent: "video", andcreated_tsso each heartbeat is a distinctevent (Synapse can dedupe identical state-event content).
expireswindow — matches whatmatrix-js-sdk's
MatrixRTCSessiondoes for Element Call.removeCallMemberEvent()with a 2-second cap so peers see us goimmediately rather than waiting for
expires_tsto fire.with the right call event PLs at creation via
MatrixService.callPowerLevels.UI
ratio (live, not just on resize —
TrackDelegate.didUpdateDimensionsbumps
videoTrackRevision), surrounded by the call's gradient backdrop.Self always stays in the bottom-right PiP.
borders. Solid black-tinted name pill (ultraThinMaterial vanishes over
bright video).
didUpdateIsMuted/didUnsubscribeTrack/didUnpublishTrackcallbacks.Logging / privacy
[RTC]prefix on every call-related log makes filtering trivial in Console.A pre-merge audit (
695a027) tightened the privacy qualifiers — full widgetJSON payloads (which include AES keys for
io.element.call.encryption_keys)are never logged any more; SDK error strings, room IDs, and call.member
content are
.private. Matrix user/device IDs intentionally stay.publicsince they're observable on the homeserver and useful for diagnostics.
Test plan
tiles render, names show, speaker glow tracks, mute/unmute flips
placeholder immediately.
(not minutes-of-expires-ts).
window closes when the room empties.
[RTC]Heartbeat refreshed call.member state eventin Console withprivate logs enabled).
[RTC]and confirm no AES key material,no JWT, no full event content visible at default privacy.
🤖 Generated with Claude Code