feat(pathfinding): non-eager TryGetMask + second-touch promotion#2451
Open
kamronbatman wants to merge 6 commits into
Open
feat(pathfinding): non-eager TryGetMask + second-touch promotion#2451kamronbatman wants to merge 6 commits into
kamronbatman wants to merge 6 commits into
Conversation
A chunk's first miss within the promotion window now returns Fallthrough_NotBuilt instead of eagerly running BuildChunk. The caller (BitmapAStarAlgorithm) takes the slow path for that cell. The second miss inside the window promotes to BuildChunk + serve. This filters single-touch pass-throughs (mounted player + pet covering many chunks at ~4 tiles/sec) so the cache reserves its ~700us BuildChunk cost for chunks NPCs actually hit repeatedly. - Add CacheHitKind.Fallthrough_NotBuilt + FallthroughNotBuilt counter. - StepCache._chunkMissTracker (capped Dictionary<long, ChunkMissState>); window- expiry restarts the count, capacity overflow prunes window-old entries first. - StepCache.MissPromotionThreshold (default 2) and MissPromotionWindowMs (default 30s) are tunable for tests/ops. - Lazy-reader hits bypass the tracker — file-loaded chunks represent an explicit prior decision to keep the chunk warm. - BitmapAStarAlgorithm.GetSuccessorsSlowPath now layers IsBlockedByDynamic on top of CheckMovement: the slow path was previously rare (CanFly + fallthroughs) and this gap was invisible. Now first-touch pathfinds run through it, so mobile blocking has to be checked there too. - Tests that prime chunks via single TryGetMask call set MissPromotionThreshold=1 for legacy eager behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lazy bypass Three new pathfinding tests around the PR-6 promotion gate: - TryGetMask_SecondTouchAfterWindow_RestartsCounterAndDefers — second miss outside the promotion window restarts the count, returns Fallthrough_NotBuilt again (a chunk briefly glanced through and revisited minutes later doesn't promote on the late second touch). - TryGetMask_DistinctChunks_TrackedIndependently — counters are per-chunk; one touch each on two adjacent chunks both stay in fallthrough. - LazyReaderHit_BypassesMissTrackerOnFirstTouch — when an .swb is open and the chunk is in the file, first touch hits without consulting the tracker and without incrementing FallthroughNotBuilt. Production with .swb loaded skips the gate entirely. Switch ShouldPromoteAfterMiss to Environment.TickCount so window expiry works under the test fixture (Core.TickCount only advances when the game loop runs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR-6 BDN run (corpus + 4 providers) showed Cold rows 6-9 still 12-35x
slower than FastAStar — the gate was firing inside a single Find. Root
cause: A* expansion calls TryGetMask many times per visited chunk during
one pathfind; with threshold=2, the 2nd cell expansion in chunk X
crossed the threshold immediately and triggered BuildChunk. Effectively
"non-eager" only for chunks the algorithm visited exactly once, which
on open-plain pet/NPC paths is rare.
Switch the tracker to count distinct Find generations per chunk:
- StepCache.BeginFindGeneration() — bumps a uint counter; 0 reserved
as "no Find started" sentinel (legacy callers that don't bump it,
e.g. single-call tests, fall through to per-call counting).
- ChunkMissState gains LastFindGeneration. ShouldPromoteAfterMiss
short-circuits when the entry's gen matches the current gen — the
same Find probing the chunk again doesn't increment.
- BitmapAStarAlgorithm.Find() calls BeginFindGeneration() at the top.
One pathfind = one tick per chunk, regardless of expansion count.
- BakeMap stashes/restores threshold to 1 inside its scan; otherwise
every chunk would defer (each touched once) and produce an empty .swb.
Expected BDN effect on Cold rows 6-9: collapse from ~2,300 us to slow-
path floor (close to FastAStar Cold). LazyWarm/WarmNoFile/LazyCold rows
unchanged — the gate only matters when neither resident chunks nor a
lazy reader can satisfy the query.
Tests:
- StepCache lifecycle: 2 new tests for the per-gen behavior
(multiple-calls-same-gen-stay-fallthrough, two-gens-across-window-
restart-counter). Existing tests pass unchanged thanks to the gen=0
legacy fallback.
- BitmapAStarAlgorithm: 2 integration tests verifying single Find
triggers 0 builds and second overlapping Find triggers >0 builds.
Marked TODO-remove once the gate is proven stable in production.
Closes the per-call-gate misdesign noted in PR-6 review of bdn-pr-6.md.
Adds StepCache.PreloadOnLazyOpen (default false). When set to true before TryOpenLazyReader, every indexed chunk is materialized into the resident set immediately instead of on first query. Trades ~25-50ms boot time per fully-baked map for zero first-touch latency on every chunk thereafter. Production target: shards with RAM headroom that want flat pathfind latency from first NPC pathfind onward, no warm-up penalty when traffic crosses into a region for the first time. Hobby admins keep the lazy default; the flag is opt-in. Adds LazyReader.EnumerateChunkCoords() so the cache can iterate the file index without exposing the internal offsets dictionary, and a test that verifies post-open ResidentChunks == file.IndexedChunkCount with no BuildsTotal increment (chunks come from file, not baker). Closes the LazyCold #2/#4 first-touch concern from the BDN follow-up: those scenarios show ~6us/chunk file-read amortization that, in production, only fires once per chunk per server lifetime. Operators who want that gone entirely can flip the flag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a per-cell swim-perspective layer to StepChunk, populated only on
chunks that contain at least one shore cell (a cell with both a walkable
land surface and a water surface separated by > StepHeight). Closes the
last structural cache failure: swim creatures pathing along coastlines
no longer fall to the slow path on shore cells.
The shore-cell case today:
- Bake at standing-Z lands on the land surface (e.g. z=0 sand).
- WetMask + SwimZ_dir are computed at that walk-Z perspective.
- A sea serpent at water Z (z=-5) queries the cell:
|sourceZ - chunk.SourceZ[cell]| = |−5 − 0| = 5 > StepHeight(2)
=> Fallthrough_SourceZMismatch => slow path for every coastline cell.
The swim layer:
- Per-chunk: 10 lazily-allocated arrays (SwimSourceZ + SwimMask +
8 SwimZ_dir layers). Null on inland chunks and pure deep-ocean
chunks (~95% of any map). 0 bytes overhead in the common case.
- Per shore cell: SwimSourceZ = water surface Z, SwimMask + SwimZ_dir
computed from a second ComputeMaskAt(swimZ) pass. Cells in the same
chunk that aren't shore cells get the NoSwimLayerCell sentinel
(sbyte.MinValue) in SwimSourceZ.
Runtime:
- Source-Z guard miss now consults the swim layer before returning
Fallthrough_SourceZMismatch. If chunk.HasSwimLayer && |sourceZ -
chunk.SwimSourceZ[cell]| <= StepHeight (and SwimSourceZ != sentinel),
serve from the swim layer with walkMask=0.
- Walker queries on shore cells unchanged: they hit the primary
source-Z guard at the walk Z and serve from the walk layer.
File format bump 2 → 3:
- HasSwimLayer byte joins HasStrata in the chunk header.
- When set, swim layer trailer (SwimSourceZ + SwimMask + 8 swim Z
arrays = 2,560 bytes) precedes the strata trailer.
- MinSupportedVersion bumped to 3 — operators upgrading from PR-5
bakes see a one-time re-bake on first boot under v3.
Storage cost: ~2.5 KB per coastline chunk × ~50-100 coast chunks per
map = ~150-250 KB extra per fully-baked map. Inland chunks add 1 byte
(the HasSwimLayer flag). Trivial.
Tests (59/59 pass, +4):
- SwimLayer_NotInjected_StaysFallthroughOnSourceZMismatch — chunks
without the layer behave exactly as before.
- SwimLayer_InjectedMatchingZ_ReturnsHitFromSwimLayer — synthetic
swim-layer injection produces walkMask=0 + swim mask + swim Zs at
the swim source Z.
- SwimLayer_InjectedButCellHasNoSentinel_FallsThrough — cells with
the NoSwimLayerCell sentinel must NOT match queries at -128.
- SwimLayer_RoundTrips_ThroughLazyReader — Save/Clear/LazyOpen
preserves the swim layer end-to-end through file v3.
Bench validation (next): need a coastline scenario in the bench corpus
(sea serpent rounding a peninsula) to measure the win — that lands as a
follow-up bench commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ement)
Tightens shore-cell bake: if ComputeMaskAt at swim source Z produces
wetMask=0 (i.e. StaticsBlockAt rejected the swim creature due to a
low-clearance bridge/dock/pier above the water), skip populating the
swim layer for that cell. The NoSwimLayerCell sentinel stays in
SwimSourceZ for the cell, so runtime queries fall through to slow path
(which independently produces the same "no movement" answer).
Two reasons:
- Storage: don't waste 18 bytes per cell on a stratum that always
returns 0 mask.
- Semantic clarity: HasSwimLayer means "swim is possible somewhere
in this chunk," not "we baked a stratum here regardless of whether
swim creatures can actually fit."
Matches the principle that strata represent USABLE surfaces. The slow
path for these cells produces wetMask=0 too — parity preserved.
Tests still 59/59 (no test exercised the wetMask=0 skip; that path is
specific to map cells with bridges/docks blocking vertical clearance,
which the synthetic injection tests don't reproduce).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the Cold-cache regression flagged in PR #2450.
StepCache.TryGetMaskno longer eagerly runsBuildChunkon the first miss for a chunk that isn't in a.swblazy reader. Instead it returnsFallthrough_NotBuiltand the caller (BitmapAStarAlgorithm) takes the per-cell slow path. The chunk is only promoted to the bitmap fast path after the second miss within a 30-second window, filtering single-touch pass-throughs.This makes BitmapAStar's worst-case (cold cache + short hops) collapse from 12–47× slower than FastAStar to roughly the same, which is the floor the slow path can deliver. Steady-state warm performance (the actual deliverable) is unchanged from PR-5 — it was always the cache fast path.
The pet-follow scenario this fixes
A mounted player at ~4 tiles/sec with a pet/hireable following will trigger an NPC pathfind every 100–300 ms. Each pathfind is 1–6 tiles. As the player crosses chunk boundaries (~4 sec/chunk), the pet's first pathfind in the new chunk under the previous behavior triggered a full ~700 µs
BuildChunkfor a chunk the player would leave shortly after. At 50–100 mobiles per shard, this exceeded the 8 ms tick budget. PR-5 BDN data showed scenarios 6–9 (2–8 tile NPC perception) at 2,300–3,700 µs Cold vs FastAStar's 80–200 µs.Under the new gate:
Fallthrough_NotBuilt→ caller uses slow path (~30–50 µs short path). NoBuildChunk. No allocation.What changed
CacheHitKind.Fallthrough_NotBuilt = 6+CacheStats.FallthroughNotBuiltcounter.IsHit=false, so the caller routes to slow path.StepCache._chunkMissTracker—Dictionary<long, ChunkMissState>capped at 4096 entries. State is(byte missCount, uint lastMissTickStamp)keyed by chunk key. Window-expired entries reset count to 1; capacity overflow prunes window-old entries first.StepCache.MissPromotionThreshold(default2) andStepCache.MissPromotionWindowMs(default30_000) — tunable, can be wired throughServerConfigurationif shards want different policy. Setting threshold to1restores legacy eager-build behavior (used by tests that prime chunks via singleTryGetMaskcall).StepCache.TryGetMaskmiss branch — try lazy reader first (file-loaded chunks bypass the tracker entirely; an.swbrepresents an explicit prior decision to keep the chunk warm). Otherwise consult the tracker.BitmapAStarAlgorithm.GetSuccessorsSlowPathnow layersIsBlockedByDynamicon top ofCalcMoves.CheckMovement. Previously the slow path only ran forCanFlycreatures and rare cache fallthroughs —CheckMovementdoesn't iterate same-cell mobiles, so the bitmap fast path'sIsBlockedByDynamicwas the only mobile-blocking check. Now first-touch pathfinds run through the slow path, so the gap had to close.Tests
50 pathfinding tests pass (was 47). New / updated:
TryGetMask_FirstTouchOnUnbuiltChunk_DefersBuildAndReturnsFallthrough— single TryGetMask call returnsFallthrough_NotBuilt, no chunk built, no allocation.TryGetMask_SecondTouchWithinWindow_PromotesAndBuilds— second call inside the 30s window builds + serves.TryGetMask_SecondTouchAfterWindow_RestartsCounterAndDefers— second call outside the window restarts the count, returns Fallthrough again.TryGetMask_DistinctChunks_TrackedIndependently— counters are per-chunk; one touch on each of two adjacent chunks both stay in fallthrough.LazyReaderHit_BypassesMissTrackerOnFirstTouch— open.swb+ first touch hits without consulting the tracker. Production with.swbloaded skips the gate entirely.MultisVersion_Bump_TriggersDirtyRebuild— updated to reflect the new 3-step flow (Fallthrough → Miss_NotBuilt → Miss_DirtyRebuild).TryGetMaskcall (multi-Z, Tier4, lifecycle, parity, BitmapAStar uses-cache) setMissPromotionThreshold = 1to opt into eager behavior.Expected BDN impact
The Cold column from PR-5's BDN should change as follows once the bench's submodule pointer is updated to this branch:
WarmNoFile and LazyWarm rows should be unchanged — they were always cache-warm. The miss tracker only fires when neither resident chunks nor the lazy reader can satisfy the request.
Future work (not in this PR)