Skip to content

Share sparkey readers with memory-aware heap fallback#5904

Draft
spkrka wants to merge 1 commit intospotify:mainfrom
spkrka:krka/heap-fallback
Draft

Share sparkey readers with memory-aware heap fallback#5904
spkrka wants to merge 1 commit intospotify:mainfrom
spkrka:krka/heap-fallback

Conversation

@spkrka
Copy link
Copy Markdown
Member

@spkrka spkrka commented Mar 28, 2026

Summary

Alternative approach to #5903 for shared sparkey readers with memory-aware heap fallback. Both PRs share the same core features (reader cache, HostMemoryTracker, heap fallback). This PR adds a cleaner architecture on top:

  • 2-phase shard opening: plan all shards first (claiming budgets atomically), then open heap readers before mmap readers. This ensures heap readers' temporary page cache I/O (during file read into byte[]) doesn't evict pages that mmap readers need.

  • ShardPlan case class: each shard's plan (index, file, size, useHeap, budgeted) is a first-class value. Summary stats derived via .filter().map().sum.

  • Single summary log per load: instead of per-shard log lines, one INFO/WARN line per sparkey load showing GiB breakdown by mode (e.g. Loaded 256 sparkey shard(s) from gs://bucket/path/*: 150.32 GiB off-heap, 49.68 GiB on heap).

  • planShard/openReader helpers: deduplicates logic between sharded and unsharded paths.

Comparison with #5903

#5903 #5904 (this)
Shared reader cache
HostMemoryTracker
Heap fallback
Sparkey bump (3.5.1 → 3.7.0)
2-phase shard opening (heap first)
ShardPlan case class
Per-shard logging per-shard INFO/WARN single summary line
Code deduplication separate paths planShard/openReader helpers

Both are viable — #5903 is simpler, #5904 is more polished with better page cache behavior for mixed heap+mmap workloads.

Test plan

  • Unit tests for HostMemoryTracker (budget claiming, independence, edge cases)
  • Unit tests for OpenWithMemoryTracking (mmap, heap, unbounded fallback modes)
  • Existing sparkey tests pass
  • MiMa binary compatibility check passes
  • Integration test on Dataflow with large sharded sparkey

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.74%. Comparing base (d59fbba) to head (455814f).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5904      +/-   ##
==========================================
+ Coverage   61.54%   61.74%   +0.19%     
==========================================
  Files         317      318       +1     
  Lines       11653    11711      +58     
  Branches      822      833      +11     
==========================================
+ Hits         7172     7231      +59     
+ Misses       4481     4480       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…llback

- Shared reader cache: static ConcurrentHashMap deduplicates readers across
  all DoFn instances on a worker (was: one reader per vCPU thread).
  Loaded concurrently via a dedicated 4-thread daemon pool.

- HostMemoryTracker: estimates off-heap budget (totalPhysical - maxHeap - 2GB)
  and heap budget (maxHeap - max(4GB, 10%)). Each shard atomically claims
  from off-heap first; if exhausted, claims heap and opens with
  Sparkey.reader().useHeap(true); if neither fits, falls back to mmap
  with a warning.

- 2-phase shard opening: plan all shards first (claiming budgets), then
  open heap readers before mmap readers. This ensures heap readers'
  temporary page cache I/O doesn't evict pages that mmap readers need.

- Bump sparkey 3.5.1 → 3.7.0 for the heap-backed reader API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant