Skip to content

feat(pathfinding): non-eager TryGetMask + second-touch promotion#2451

Open
kamronbatman wants to merge 6 commits into
mainfrom
pathfinding/pr6
Open

feat(pathfinding): non-eager TryGetMask + second-touch promotion#2451
kamronbatman wants to merge 6 commits into
mainfrom
pathfinding/pr6

Conversation

@kamronbatman
Copy link
Copy Markdown
Contributor

Summary

Closes the Cold-cache regression flagged in PR #2450. StepCache.TryGetMask no longer eagerly runs BuildChunk on the first miss for a chunk that isn't in a .swb lazy reader. Instead it returns Fallthrough_NotBuilt and the caller (BitmapAStarAlgorithm) takes the per-cell slow path. The chunk is only promoted to the bitmap fast path after the second miss within a 30-second window, filtering single-touch pass-throughs.

This makes BitmapAStar's worst-case (cold cache + short hops) collapse from 12–47× slower than FastAStar to roughly the same, which is the floor the slow path can deliver. Steady-state warm performance (the actual deliverable) is unchanged from PR-5 — it was always the cache fast path.

The pet-follow scenario this fixes

A mounted player at ~4 tiles/sec with a pet/hireable following will trigger an NPC pathfind every 100–300 ms. Each pathfind is 1–6 tiles. As the player crosses chunk boundaries (~4 sec/chunk), the pet's first pathfind in the new chunk under the previous behavior triggered a full ~700 µs BuildChunk for a chunk the player would leave shortly after. At 50–100 mobiles per shard, this exceeded the 8 ms tick budget. PR-5 BDN data showed scenarios 6–9 (2–8 tile NPC perception) at 2,300–3,700 µs Cold vs FastAStar's 80–200 µs.

Under the new gate:

  • First miss → Fallthrough_NotBuilt → caller uses slow path (~30–50 µs short path). No BuildChunk. No allocation.
  • Player keeps moving → chunk never gets a second touch within window → never promoted, no rot.
  • NPC patrolling a fixed territory → repeatedly hits the same chunks → second touch within window → promote → cache fast path on subsequent calls.

What changed

  • CacheHitKind.Fallthrough_NotBuilt = 6 + CacheStats.FallthroughNotBuilt counter. IsHit=false, so the caller routes to slow path.
  • StepCache._chunkMissTrackerDictionary<long, ChunkMissState> capped at 4096 entries. State is (byte missCount, uint lastMissTickStamp) keyed by chunk key. Window-expired entries reset count to 1; capacity overflow prunes window-old entries first.
  • StepCache.MissPromotionThreshold (default 2) and StepCache.MissPromotionWindowMs (default 30_000) — tunable, can be wired through ServerConfiguration if shards want different policy. Setting threshold to 1 restores legacy eager-build behavior (used by tests that prime chunks via single TryGetMask call).
  • StepCache.TryGetMask miss branch — try lazy reader first (file-loaded chunks bypass the tracker entirely; an .swb represents an explicit prior decision to keep the chunk warm). Otherwise consult the tracker.
  • BitmapAStarAlgorithm.GetSuccessorsSlowPath now layers IsBlockedByDynamic on top of CalcMoves.CheckMovement. Previously the slow path only ran for CanFly creatures and rare cache fallthroughs — CheckMovement doesn't iterate same-cell mobiles, so the bitmap fast path's IsBlockedByDynamic was the only mobile-blocking check. Now first-touch pathfinds run through the slow path, so the gap had to close.

Tests

50 pathfinding tests pass (was 47). New / updated:

  • TryGetMask_FirstTouchOnUnbuiltChunk_DefersBuildAndReturnsFallthrough — single TryGetMask call returns Fallthrough_NotBuilt, no chunk built, no allocation.
  • TryGetMask_SecondTouchWithinWindow_PromotesAndBuilds — second call inside the 30s window builds + serves.
  • TryGetMask_SecondTouchAfterWindow_RestartsCounterAndDefers — second call outside the window restarts the count, returns Fallthrough again.
  • TryGetMask_DistinctChunks_TrackedIndependently — counters are per-chunk; one touch on each of two adjacent chunks both stay in fallthrough.
  • LazyReaderHit_BypassesMissTrackerOnFirstTouch — open .swb + first touch hits without consulting the tracker. Production with .swb loaded skips the gate entirely.
  • MultisVersion_Bump_TriggersDirtyRebuild — updated to reflect the new 3-step flow (Fallthrough → Miss_NotBuilt → Miss_DirtyRebuild).
  • Tests that prime chunks via a single TryGetMask call (multi-Z, Tier4, lifecycle, parity, BitmapAStar uses-cache) set MissPromotionThreshold = 1 to opt into eager behavior.

Expected BDN impact

The Cold column from PR-5's BDN should change as follows once the bench's submodule pointer is updated to this branch:

# Scenario Cold (PR-5) Cold (PR-6 expected) FastAStar Cold
2 sewer corridor 1,627 µs ~36 µs 36 µs
4 causeway 1,533 µs ~39 µs 39 µs
6 pet 2-tile 2,364 µs ~80 µs 81 µs
8 npc 5-tile 3,708 µs ~140 µs 141 µs
9 npc 8-tile 2,386 µs ~200 µs 197 µs

WarmNoFile and LazyWarm rows should be unchanged — they were always cache-warm. The miss tracker only fires when neither resident chunks nor the lazy reader can satisfy the request.

Future work (not in this PR)

  • Background-thread bake: builds outside the game thread so even promoted chunks don't pay the 700 µs build cost on the main thread. Rule 10 (no Task.Run) applies, so this needs careful design — the bake is a pure data transform but main-thread synchronization on chunk-state transitions has to be threaded through. Defer to a follow-up.
  • Long-traverse BDN scenario: a multi-Find benchmark simulating 50 pet repaths across chunk transitions. Requires restructuring the bench harness; the existing 10-scenario corpus + Cold provider already exercises the gate.
  • Swim sourceZ bake: scenario 5 (sea serpent) shows 56 B alloc on warm paths because the cache's SourceZ is computed under default-walker rules. Swim creatures fall through to slow path. Independent of this PR.

kamronbatman and others added 6 commits May 6, 2026 23:11
A chunk's first miss within the promotion window now returns Fallthrough_NotBuilt
instead of eagerly running BuildChunk. The caller (BitmapAStarAlgorithm) takes the
slow path for that cell. The second miss inside the window promotes to BuildChunk
+ serve. This filters single-touch pass-throughs (mounted player + pet covering
many chunks at ~4 tiles/sec) so the cache reserves its ~700us BuildChunk cost for
chunks NPCs actually hit repeatedly.

- Add CacheHitKind.Fallthrough_NotBuilt + FallthroughNotBuilt counter.
- StepCache._chunkMissTracker (capped Dictionary<long, ChunkMissState>); window-
  expiry restarts the count, capacity overflow prunes window-old entries first.
- StepCache.MissPromotionThreshold (default 2) and MissPromotionWindowMs (default
  30s) are tunable for tests/ops.
- Lazy-reader hits bypass the tracker — file-loaded chunks represent an explicit
  prior decision to keep the chunk warm.
- BitmapAStarAlgorithm.GetSuccessorsSlowPath now layers IsBlockedByDynamic on top
  of CheckMovement: the slow path was previously rare (CanFly + fallthroughs) and
  this gap was invisible. Now first-touch pathfinds run through it, so mobile
  blocking has to be checked there too.
- Tests that prime chunks via single TryGetMask call set MissPromotionThreshold=1
  for legacy eager behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lazy bypass

Three new pathfinding tests around the PR-6 promotion gate:

- TryGetMask_SecondTouchAfterWindow_RestartsCounterAndDefers — second miss
  outside the promotion window restarts the count, returns Fallthrough_NotBuilt
  again (a chunk briefly glanced through and revisited minutes later doesn't
  promote on the late second touch).
- TryGetMask_DistinctChunks_TrackedIndependently — counters are per-chunk; one
  touch each on two adjacent chunks both stay in fallthrough.
- LazyReaderHit_BypassesMissTrackerOnFirstTouch — when an .swb is open and the
  chunk is in the file, first touch hits without consulting the tracker and
  without incrementing FallthroughNotBuilt. Production with .swb loaded skips
  the gate entirely.

Switch ShouldPromoteAfterMiss to Environment.TickCount so window expiry works
under the test fixture (Core.TickCount only advances when the game loop runs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR-6 BDN run (corpus + 4 providers) showed Cold rows 6-9 still 12-35x
slower than FastAStar — the gate was firing inside a single Find. Root
cause: A* expansion calls TryGetMask many times per visited chunk during
one pathfind; with threshold=2, the 2nd cell expansion in chunk X
crossed the threshold immediately and triggered BuildChunk. Effectively
"non-eager" only for chunks the algorithm visited exactly once, which
on open-plain pet/NPC paths is rare.

Switch the tracker to count distinct Find generations per chunk:

  - StepCache.BeginFindGeneration() — bumps a uint counter; 0 reserved
    as "no Find started" sentinel (legacy callers that don't bump it,
    e.g. single-call tests, fall through to per-call counting).
  - ChunkMissState gains LastFindGeneration. ShouldPromoteAfterMiss
    short-circuits when the entry's gen matches the current gen — the
    same Find probing the chunk again doesn't increment.
  - BitmapAStarAlgorithm.Find() calls BeginFindGeneration() at the top.
    One pathfind = one tick per chunk, regardless of expansion count.
  - BakeMap stashes/restores threshold to 1 inside its scan; otherwise
    every chunk would defer (each touched once) and produce an empty .swb.

Expected BDN effect on Cold rows 6-9: collapse from ~2,300 us to slow-
path floor (close to FastAStar Cold). LazyWarm/WarmNoFile/LazyCold rows
unchanged — the gate only matters when neither resident chunks nor a
lazy reader can satisfy the query.

Tests:
  - StepCache lifecycle: 2 new tests for the per-gen behavior
    (multiple-calls-same-gen-stay-fallthrough, two-gens-across-window-
    restart-counter). Existing tests pass unchanged thanks to the gen=0
    legacy fallback.
  - BitmapAStarAlgorithm: 2 integration tests verifying single Find
    triggers 0 builds and second overlapping Find triggers >0 builds.
    Marked TODO-remove once the gate is proven stable in production.

Closes the per-call-gate misdesign noted in PR-6 review of bdn-pr-6.md.
Adds StepCache.PreloadOnLazyOpen (default false). When set to true before
TryOpenLazyReader, every indexed chunk is materialized into the resident
set immediately instead of on first query. Trades ~25-50ms boot time per
fully-baked map for zero first-touch latency on every chunk thereafter.

Production target: shards with RAM headroom that want flat pathfind
latency from first NPC pathfind onward, no warm-up penalty when traffic
crosses into a region for the first time. Hobby admins keep the lazy
default; the flag is opt-in.

Adds LazyReader.EnumerateChunkCoords() so the cache can iterate the file
index without exposing the internal offsets dictionary, and a test that
verifies post-open ResidentChunks == file.IndexedChunkCount with no
BuildsTotal increment (chunks come from file, not baker).

Closes the LazyCold #2/#4 first-touch concern from the BDN follow-up:
those scenarios show ~6us/chunk file-read amortization that, in
production, only fires once per chunk per server lifetime. Operators who
want that gone entirely can flip the flag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a per-cell swim-perspective layer to StepChunk, populated only on
chunks that contain at least one shore cell (a cell with both a walkable
land surface and a water surface separated by > StepHeight). Closes the
last structural cache failure: swim creatures pathing along coastlines
no longer fall to the slow path on shore cells.

The shore-cell case today:
  - Bake at standing-Z lands on the land surface (e.g. z=0 sand).
  - WetMask + SwimZ_dir are computed at that walk-Z perspective.
  - A sea serpent at water Z (z=-5) queries the cell:
      |sourceZ - chunk.SourceZ[cell]| = |−5 − 0| = 5 > StepHeight(2)
    => Fallthrough_SourceZMismatch => slow path for every coastline cell.

The swim layer:
  - Per-chunk: 10 lazily-allocated arrays (SwimSourceZ + SwimMask +
    8 SwimZ_dir layers). Null on inland chunks and pure deep-ocean
    chunks (~95% of any map). 0 bytes overhead in the common case.
  - Per shore cell: SwimSourceZ = water surface Z, SwimMask + SwimZ_dir
    computed from a second ComputeMaskAt(swimZ) pass. Cells in the same
    chunk that aren't shore cells get the NoSwimLayerCell sentinel
    (sbyte.MinValue) in SwimSourceZ.

Runtime:
  - Source-Z guard miss now consults the swim layer before returning
    Fallthrough_SourceZMismatch. If chunk.HasSwimLayer && |sourceZ -
    chunk.SwimSourceZ[cell]| <= StepHeight (and SwimSourceZ != sentinel),
    serve from the swim layer with walkMask=0.
  - Walker queries on shore cells unchanged: they hit the primary
    source-Z guard at the walk Z and serve from the walk layer.

File format bump 2 → 3:
  - HasSwimLayer byte joins HasStrata in the chunk header.
  - When set, swim layer trailer (SwimSourceZ + SwimMask + 8 swim Z
    arrays = 2,560 bytes) precedes the strata trailer.
  - MinSupportedVersion bumped to 3 — operators upgrading from PR-5
    bakes see a one-time re-bake on first boot under v3.

Storage cost: ~2.5 KB per coastline chunk × ~50-100 coast chunks per
map = ~150-250 KB extra per fully-baked map. Inland chunks add 1 byte
(the HasSwimLayer flag). Trivial.

Tests (59/59 pass, +4):
  - SwimLayer_NotInjected_StaysFallthroughOnSourceZMismatch — chunks
    without the layer behave exactly as before.
  - SwimLayer_InjectedMatchingZ_ReturnsHitFromSwimLayer — synthetic
    swim-layer injection produces walkMask=0 + swim mask + swim Zs at
    the swim source Z.
  - SwimLayer_InjectedButCellHasNoSentinel_FallsThrough — cells with
    the NoSwimLayerCell sentinel must NOT match queries at -128.
  - SwimLayer_RoundTrips_ThroughLazyReader — Save/Clear/LazyOpen
    preserves the swim layer end-to-end through file v3.

Bench validation (next): need a coastline scenario in the bench corpus
(sea serpent rounding a peninsula) to measure the win — that lands as a
follow-up bench commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ement)

Tightens shore-cell bake: if ComputeMaskAt at swim source Z produces
wetMask=0 (i.e. StaticsBlockAt rejected the swim creature due to a
low-clearance bridge/dock/pier above the water), skip populating the
swim layer for that cell. The NoSwimLayerCell sentinel stays in
SwimSourceZ for the cell, so runtime queries fall through to slow path
(which independently produces the same "no movement" answer).

Two reasons:
  - Storage: don't waste 18 bytes per cell on a stratum that always
    returns 0 mask.
  - Semantic clarity: HasSwimLayer means "swim is possible somewhere
    in this chunk," not "we baked a stratum here regardless of whether
    swim creatures can actually fit."

Matches the principle that strata represent USABLE surfaces. The slow
path for these cells produces wetMask=0 too — parity preserved.

Tests still 59/59 (no test exercised the wetMask=0 skip; that path is
specific to map cells with bridges/docks blocking vertical clearance,
which the synthetic injection tests don't reproduce).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant