Skip to content

perf(replay): optimize screenshot mode — 93% faster (14.7x)#536

Draft
marandaneto wants to merge 1 commit intomainfrom
perf/replay-integration-optimization
Draft

perf(replay): optimize screenshot mode — 93% faster (14.7x)#536
marandaneto wants to merge 1 commit intomainfrom
perf/replay-integration-optimization

Conversation

@marandaneto
Copy link
Copy Markdown
Member

@marandaneto marandaneto commented Mar 25, 2026

💡 Motivation and Context

Relates to #321

Session replay's screenshot capture was consuming 3,319µs per snapshot on the main thread — nearly 20% of a 60fps frame budget. Users on older devices reported frame drops and jank caused by the view hierarchy traversal and image rendering.

This PR reduces main-thread time to ~230µs (93.2% reduction, 14.7× faster), bringing frame budget impact down to 1.4% at 60fps — negligible.

Performance Results

Metric Before After Improvement
Total 3,319µs ~230µs −93.2%
findMaskableWidgets 1,760µs ~175µs −90.0%
toImage (drawHierarchy) 653µs ~50µs −92.3%
Frame budget (60fps) 19.9% 1.4%

Key Optimizations

Main-thread performance:

  1. Plain UIView + UIWindow fast path — Skip all type casts for container views that can never contain sensitive content. Biggest win (~66%).
  2. Custom CGContext renderer (PostHogGraphicsImageRenderer) — Based on Sentry's approach. Bypasses UIGraphicsImageRenderer overhead by directly allocating a CGContext with malloc. Uses native screen scale which avoids internal rescaling in drawHierarchy.
  3. Config caching (ReplayMaskConfig) — Cache maskAllTextInputs, maskAllImages, maskAllSandboxedViews once per snapshot instead of traversing weak ref + optional chain per view.
  4. If-else type chains — Skip unnecessary type casts after a match in both findMaskableWidgets and toWireframe.
  5. Touch phase filtering — Skip location(in:) for non-began/ended phases (avoids coordinate conversion during drag/scroll).
  6. Move lock + meta event off main threadwindowViewsLock access and meta event dict creation moved to dispatch queue.

Off-main-thread + memory:

  • Skip maskImage re-render when maskableWidgets is empty (saves ~1.3MB)
  • Release UIImage early in toDict() before base64 encoding (reduces peak memory)
  • Pre-size dictionaries in toDict() (minimumCapacity: 16)
  • Hex lookup table in toRGBString() — replaces String(format:) with direct char array
  • malloc instead of calloc for renderer buffer (skip zero-init of ~12MB at 3x scale)

Files Changed

  • PostHogReplayIntegration.swift — Fast paths, config caching, if-else chains, touch handler, async refactor
  • PostHogGraphicsImageRenderer.swiftNEW custom CGContext renderer
  • UIView+Util.swift — Custom renderer integration, isNoCapture() optimization
  • CGColor+Util.swift — Hex lookup table for color string formatting
  • RRWireframe.swiftmaskImage skip, early image release, dict pre-sizing
  • RRStyle.swift — Dict pre-sizing

💚 How did you test it?

  • Benchmark test on iOS Simulator (iPhone 17 Pro) with 200-view synthetic hierarchy (484 actual views including UIKit internals)
  • 80 experiments across 9 optimization sessions with automated measurement
  • make buildIOS and swift build both pass
  • make format clean (SwiftLint + SwiftFormat)

📝 Checklist

  • I reviewed the submitted code.
  • I added tests to verify the changes.
  • I updated the docs if needed.
  • No breaking change or entry added to the changelog.

Main-thread time for screenshot snapshot reduced from 3,319µs to ~230µs
on a 200-view hierarchy (484 actual views including UIKit internals).

Key optimizations:
- Fast path for plain UIView/UIWindow containers (skip type casts)
- Custom CGContext renderer (PostHogGraphicsImageRenderer) based on Sentry's approach
- Native screen scale for drawHierarchy (avoids internal rescaling)
- Cache config flags per snapshot via ReplayMaskConfig struct
- Hex lookup table for toRGBString (avoid String(format:))
- Skip maskImage re-render when no maskableWidgets
- Early image release in toDict to reduce peak memory
- Filter touch phases before capturing location
- Move lock + meta event creation off main thread
- malloc instead of calloc for renderer buffer
@marandaneto marandaneto requested a review from a team as a code owner March 25, 2026 16:42
@marandaneto marandaneto marked this pull request as draft March 25, 2026 16:43
@marandaneto
Copy link
Copy Markdown
Member Author

@ioannisj autoresearch result, we'd need to test everything but the results are good

@github-actions
Copy link
Copy Markdown
Contributor

posthog-ios Compliance Report

Date: 2026-03-25 16:59:56 UTC
Duration: 522ms

✅ All Tests Passed!

0/0 tests passed


@ioannisj
Copy link
Copy Markdown
Contributor

@ioannisj autoresearch result, we'd need to test everything but the results are good

Oh wow! Will take a look and test. Looks promising from PR description

KristijanKocev added a commit to KristijanKocev/posthog-ios that referenced this pull request Mar 29, 2026
…mentation

- switch replay screenshot triggering to a CADisplayLink-driven cadence,
  using throttleDelay only as a minimum spacing guard
- add single-flight snapshot scheduling with one pending follow-up capture
- bring in the custom CGContext-based screenshot renderer from PR PostHog#536
  and use native screen scale for drawHierarchy captures
- add session replay performance instrumentation for main-thread capture,
  masking, serialization, and background processing
- optimize screenshot serialization helpers by avoiding unnecessary masked
  image work, caching replay mask config, and reducing color formatting overhead
- extend the view layout publisher with CALayer-based change monitoring and
  suppression utilities to avoid self-triggered replay churn during capture
- simplify touch replay handling to only emit began/ended touch events
- wire the new replay performance tracker into the Xcode project
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants