Feature/prefix cache with redis reqcnt by penfree · Pull Request #2089 · vllm-project/aibrix

penfree · 2026-04-08T07:05:09Z

Summary

This PR includes critical bugfixes for KV sync prefix cache routing and adds
Redis-based request counting support for multi-instance gateway deployments.

Changes

✨ Feature: Redis-Based Request Counter for Multi-Instance Gateway Deployments

Motivation:
When multiple AIBrix gateway instances are deployed, each instance only tracks local
request counts. This causes incorrect load imbalance detection since no single
instance has the complete view of pod request distribution.

Implementation:

Added RequestCounter interface with two implementations:
- localRequestCounter: Uses local metrics (existing behavior)
- redisRequestCounter: Uses Redis hash to track distributed request counts
Modified prefix cache router to accept request counter via dependency injection
Refactored load balancing functions to accept map[string]int instead of
cache.Cache
Environment variable AIBRIX_PREFIX_CACHE_USE_REDIS_REQCNT=true enables
Redis-based counting

Key Features:

Redis key design: aibrix:prefix-cache-reqcnt:{model_name} (hashmap: pod_key →
count)
Supports Redis Cluster via hashtag {model_name} for same-slot operations
Automatic cleanup when request count reaches zero
Timeout mechanism to stop tracking inactive models (5 minutes)
Returns all ready pods with count=0 for pods not in Redis

Files Changed:

pkg/plugins/gateway/algorithms/request_counter.go (new file): RequestCounter
interface and implementations
pkg/plugins/gateway/algorithms/prefix_cache.go: Integration with request counter
pkg/plugins/gateway/algorithms/power_of_two.go: Updated podServerKey.String()
to include namespace

Testing

✅ All existing tests pass
✅ Updated test fixtures to use requestCounter
✅ Fixed test expectations for namespace-aware pod keys

Breaking Changes

None. Changes are backward compatible:

Default behavior remains unchanged (uses local request counter)
Redis-based counting is opt-in via environment variable

Related Issues

Closes: [Issue number if applicable]

Deployment Notes

To enable Redis-based request counting:

Ensure Redis client is configured
Set environment variable: AIBRIX_PREFIX_CACHE_USE_REDIS_REQCNT=true
Multiple gateway instances will now share request count information

gemini-code-assist

Code Review

This pull request introduces a RequestCounter interface and a Redis-based implementation to track pod request counts for load balancing in the prefix cache router. It also refactors the routing logic to use a shared sync indexer and improves pod identification by including namespaces in keys. The review feedback identifies several critical bugs: incorrect key handling in both the KV sync and standard routing paths which would break pod selection, an interface violation in the local request counter regarding namespace support, and potential Redis memory leaks due to missing TTLs. Additionally, logic was flagged that could lead to counter drift if request completion is tracked without a corresponding successful initiation.

pkg/plugins/gateway/algorithms/prefix_cache.go

pkg/plugins/gateway/algorithms/request_counter.go

penfree · 2026-04-08T09:37:11Z

@Jeffwan hi, please review this pr

varungup90 · 2026-04-08T17:18:21Z

@penfree Thank you for the PR, is it possible to split the PR into two. Remove the redis counter into different PR, also I am started a PR to sync in-memory state across gateway instances (#1989) -- in that PR, it syncs in-memory prefix cache stats, but can be extended to share request counter as well.

varungup90 · 2026-04-08T17:20:30Z

Summary
This PR fixes a significant bug in kvSyncPrefixCacheRouter (it was querying an empty indexer instead of the shared one from the cache store) and adds a RequestCounter interface with Redis-backed distributed request counting across multiple gateway instances.

Critical Issues

KV sync key format mismatch — pkg/plugins/gateway/algorithms/prefix_cache.go ~line 715

GetRequestCounts() now returns keys in namespace/name format, but the loop that builds podRequestCountWithKeys looks up podRequestCount[pod.Name] (bare name). This silently fails — every lookup misses, podRequestCountWithKeys is always empty, and the router falls back to least-request selection with no valid input.

filterPodsByKey is dead code — pkg/plugins/gateway/algorithms/prefix_cache.go ~lines 635–643

The function is defined but never called. This suggests the refactoring was left incomplete. Go won't fail to compile on an unused function, but the routing code that should call it likely has the bug described in #1.

KV sync nil-indexer fallback doesn't update metrics — pkg/plugins/gateway/algorithms/prefix_cache.go ~lines 254–285

When c.GetSyncPrefixIndexer() returns nil, the code logs a warning and skips creating kvSyncRouter, but prefixCacheIndexerStatus metrics are never set. The fallback to local-indexer behavior goes unrecorded.

High Priority
4. Breaking Redis key format change for Power-of-Two router — pkg/plugins/gateway/algorithms/utils.go, power_of_two.go

podServerKey was moved to utils.go and its format changed from name_port → namespace/name/port. Existing in-flight Redis keys under the old format will be silently ignored on deploy, causing a transient load spike. This needs deployment notes.

lastModelRouteTime map grows unbounded — pkg/plugins/gateway/algorithms/request_counter.go ~line 1291

Updated on every GetRequestCountsWithPort call but never pruned. Long-running instances routing many different models will leak memory.

Medium Priority
6. Lua script re-evaluated on every call — request_counter.go ~line 1521

The Lua decrement script is a string literal inside DoneRequestCount(). In high-traffic environments, define it as a package-level redis.NewScript(...) and use EvalSha for Redis script caching.

Expire resets TTL on every increment — request_counter.go ~line 1467

A single ongoing request continuously refreshes the TTL. Consider ExpireNX (set only if no TTL exists) or ExpireGT to avoid preventing cleanup of active keys.

SyncIndexerProvider returns a concrete type — pkg/cache/cache_api.go ~line 71

GetSyncPrefixIndexer() *syncindexer.SyncPrefixHashTable tightly couples the interface to the implementation. An interface type would enable test injection and better separation of concerns.

redismock in production require block — go.mod

github.com/go-redis/redismock/v9 is a test-only library but placed in the main require block, adding it as a production binary dependency.

penfree · 2026-04-09T01:59:24Z

@varungup90 Thanks for your patience。

Summary This PR fixes a significant bug in kvSyncPrefixCacheRouter (it was querying an empty indexer instead of the shared one from the cache store) and adds a RequestCounter interface with Redis-backed distributed request counting across multiple gateway instances.

Critical Issues

KV sync key format mismatch — pkg/plugins/gateway/algorithms/prefix_cache.go ~line 715

GetRequestCounts() now returns keys in namespace/name format, but the loop that builds podRequestCountWithKeys looks up podRequestCount[pod.Name] (bare name). This silently fails — every lookup misses, podRequestCountWithKeys is always empty, and the router falls back to least-request selection with no valid input.

i haven't found any code that used bare pod name to look up podRequestCount

(.venv) ➜  pkg git:(feature/prefix-cache-with-redis-reqcnt) ack podRequestCount | grep Name           
plugins/gateway/algorithms/pd_disaggregation_benchmark_test.go:122:                             podRequestCounts[pod.Name] = requestCount
plugins/gateway/algorithms/request_counter.go:137:                      podRequestCount[keyName] = count
plugins/gateway/algorithms/prefix_cache.go:719:         // For KV sync router, we need podRequestCount to use pod keys (namespace/podName)
plugins/gateway/algorithms/pd_disaggregation.go:425:            podRequestCounts[pod.Name] = requestCount
plugins/gateway/algorithms/pd_disaggregation.go:483:            score := podRequestCounts[pod.Name] / math.Max(drainRate.GetSimpleValue(), defaultDrainRateEpsilon)
plugins/gateway/algorithms/pd_disaggregation.go:618:            reqCnt := float64(podRequestCount[pod.Name])
plugins/gateway/algorithms/pd_disaggregation.go:650:            normalizedRunningReqs := podRequestCounts[pod.Name] / maxRequestCount
plugins/gateway/algorithms/pd_disaggregation.go:667:                    "running_reqs", podRequestCounts[pod.Name], "max_running_reqs", maxRequestCount,
plugins/gateway/algorithms/pd_disaggregation.go:1168:   countInterface, _ := t.podRequestCounts.LoadOrStore(podName, &atomic.Int32{})
plugins/gateway/algorithms/pd_disaggregation.go:1191:   countInterface, exists := t.podRequestCounts.Load(podName)
plugins/gateway/algorithms/pd_disaggregation.go:1215:           countInterface, exists := t.podRequestCounts.Load(pod.Name)
plugins/gateway/algorithms/pd_disaggregation.go:1242:   countInterface, _ := t.podRequestCounts.LoadOrStore(podName, &atomic.Int32{})
plugins/gateway/algorithms/pd_disaggregation.go:1257:   countInterface, exists := t.podRequestCounts.Load(podName)
plugins/gateway/algorithms/pd_disaggregation.go:1281:   countInterface, exists := t.podRequestCounts.Load(podName)
plugins/gateway/algorithms/least_request.go:186:                podRequestCount[pod.Name] = int(runningReq.GetSimpleValue())
plugins/gateway/algorithms/least_request.go:217:                        podRequestCount[keyName] = count

filterPodsByKey is dead code — pkg/plugins/gateway/algorithms/prefix_cache.go ~lines 635–643

The function is defined but never called. This suggests the refactoring was left incomplete. Go won't fail to compile on an unused function, but the routing code that should call it likely has the bug described in #1.

removed

KV sync nil-indexer fallback doesn't update metrics — pkg/plugins/gateway/algorithms/prefix_cache.go ~lines 254–285

When c.GetSyncPrefixIndexer() returns nil, the code logs a warning and skips creating kvSyncRouter, but prefixCacheIndexerStatus metrics are never set. The fallback to local-indexer behavior goes unrecorded.

Fixed

High Priority 4. Breaking Redis key format change for Power-of-Two router — pkg/plugins/gateway/algorithms/utils.go, power_of_two.go

podServerKey was moved to utils.go and its format changed from name_port → namespace/name/port. Existing in-flight Redis keys under the old format will be silently ignored on deploy, causing a transient load spike. This needs deployment notes.

Since redis key in power of two will rotate every hour, i think this is not a problem

lastModelRouteTime map grows unbounded — pkg/plugins/gateway/algorithms/request_counter.go ~line 1291

Updated on every GetRequestCountsWithPort call but never pruned. Long-running instances routing many different models will leak memory.

the max size of this map is the amount of model, i don't think there can have so many models that cause a memory issue, this every entry has only a string and a int64. however, pruning it needs many code

Medium Priority 6. Lua script re-evaluated on every call — request_counter.go ~line 1521

The Lua decrement script is a string literal inside DoneRequestCount(). In high-traffic environments, define it as a package-level redis.NewScript(...) and use EvalSha for Redis script caching.

Fixed

Expire resets TTL on every increment — request_counter.go ~line 1467

A single ongoing request continuously refreshes the TTL. Consider ExpireNX (set only if no TTL exists) or ExpireGT to avoid preventing cleanup of active keys.

i've added key rotation, and set ttl in lua only when the key does not exist

SyncIndexerProvider returns a concrete type — pkg/cache/cache_api.go ~line 71

GetSyncPrefixIndexer() *syncindexer.SyncPrefixHashTable tightly couples the interface to the implementation. An interface type would enable test injection and better separation of concerns.

**like PrefixHashTable, a GetSharedSyncPrefixHashTable is added to get singleton of SyncPrefixHashTable, GetSyncPrefixIndexer is removed from cache_api interface.

redismock in production require block — go.mod

github.com/go-redis/redismock/v9 is a test-only library but placed in the main require block, adding it as a production binary dependency.

seems this is not an issue, the redismock is only used in test, so it's not built into the production binary

varungup90 · 2026-04-10T17:19:42Z

pkg/plugins/gateway/algorithms/prefix_cache.go

+	tokenizerPool  TokenizerPoolInterface           // Add TokenizerPool reference
+	syncIndexer    *syncindexer.SyncPrefixHashTable // Use concrete type
 	metricsEnabled bool
+	requestCounter RequestCounter // Request counter for load balancing


Local counter is not needed. This local counter is only incrementing requests for prefix-cache, compare to RealtimeNumRequestsRunning keep track of requests across all routing policies.

RealtimeNumRequestsRunning: is calculated via gateway and it is not fetched from the pod.

metricName = metrics.RealtimeNumRequestsRunning if val, err := cache.GetMetricValueByPod(pod.Name, pod.Namespace, metricName); err == nil && val != nil { count = int(val.GetSimpleValue()) }

@varungup90 i don't think so, localCounter implements the same interface as redisCounter, so kvSyncPrefixCacheRouter can use either of them as user's need.
When only one gateway instance is deployed， they don't need redisCounter with extra latency cost. localCounter can be a dropin replacement in such cases, without using many local util functions.

varungup90 · 2026-04-10T17:26:40Z

pkg/plugins/gateway/algorithms/prefix_cache.go

 	kvSyncRouter *kvSyncPrefixCacheRouter
+
+	// Request counter for load balancing
+	requestCounter RequestCounter


Lets simplify the PR, split the request counter into a different PR. This one to be more focussed on fixing bug with kv event sync.

@varungup90 already split the bugfix to #2096 , i'll rebase this pr to main branch after it's merged

@varungup90 rebased, PTAL

Signed-off-by: penfree <pengfei.qiu@gmail.com> Change-Id: I77d872965fcaebf0c455d93646a4b29e34f1d8d0

Signed-off-by: penfree <pengfei.qiu@gmail.com> Change-Id: Ifa2c860c0f41021dcb02c84aacd8829393e1b390

Signed-off-by: penfree <pengfei.qiu@gmail.com> Change-Id: I0e83f041ba96a471c1fb6931f89c717863ae6b67

Signed-off-by: penfree <pengfei.qiu@gmail.com> Change-Id: I42451439b67ebb96d01262a0f2dfe5a20240aad4

Signed-off-by: penfree <pengfei.qiu@gmail.com> Change-Id: I1e38be1fe495ab422fb7dd923dcbdf97b208e58f

Signed-off-by: penfree <pengfei.qiu@gmail.com> Change-Id: I25205a00bef2aa181d0816c7d9e3e2eb11233e79

Signed-off-by: penfree <pengfei.qiu@gmail.com> Change-Id: I459b8d61300755f2982155a4ec7763e47fb16f0e

Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: I061386e56cb0c468fb1d687295f8502974423f21

Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: I08a605b32965077bb8d16407e6d908ba430b6dcd

Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: I49584b893d6e9b7db8b46a9264a92d9f53116a5a

Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: I82543a8c72c5d2bcf05579f7da6d03968c118314

Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: I0a4c9128d40ccbe96537bc35576f8bc0b1903f2a

Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: Ie2274aae8b9af5cf70b9fe7d35a347cb1f113b44

…cPrefixHashTable Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: I832cad5cd99b64f2a509eb51f9e87c61dfc6bf5f

…y happens to crash Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: I80ff6cce0e18ab7b8303f559c9b6de17947b5242

Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: I89dd144e36682b838499f85efd5b07c518c4da0d

gemini-code-assist bot reviewed Apr 8, 2026

View reviewed changes

penfree marked this pull request as ready for review April 8, 2026 09:36

penfree mentioned this pull request Apr 8, 2026

Feat: make kvSyncPrefixCacheRouter distributable with redis imbalance… #2051

Closed

Jeffwan requested a review from varungup90 April 10, 2026 04:26

varungup90 reviewed Apr 10, 2026

View reviewed changes

pengfei913 and others added 16 commits April 13, 2026 16:50

Feat: add redis request counter

352abde

Signed-off-by: penfree <pengfei.qiu@gmail.com> Change-Id: I77d872965fcaebf0c455d93646a4b29e34f1d8d0

Feat: make kvSyncPrefixCacheRouter distributable & fix syncIndexer bug

4e86e3e

Signed-off-by: penfree <pengfei.qiu@gmail.com> Change-Id: Ifa2c860c0f41021dcb02c84aacd8829393e1b390

Feat: make kvSyncPrefixCacheRouter distributable & fix syncIndexer bug

c841cd6

Signed-off-by: penfree <pengfei.qiu@gmail.com> Change-Id: I0e83f041ba96a471c1fb6931f89c717863ae6b67

Fix: cr suggestion

4a0d3b0

Signed-off-by: penfree <pengfei.qiu@gmail.com> Change-Id: I42451439b67ebb96d01262a0f2dfe5a20240aad4

Fix: add licence

ade5907

Signed-off-by: penfree <pengfei.qiu@gmail.com> Change-Id: I1e38be1fe495ab422fb7dd923dcbdf97b208e58f

Fix: gofmt issues

6fa4130

Signed-off-by: penfree <pengfei.qiu@gmail.com> Change-Id: I25205a00bef2aa181d0816c7d9e3e2eb11233e79

Fix: gofmt issues

769bcdb

Signed-off-by: penfree <pengfei.qiu@gmail.com> Change-Id: I459b8d61300755f2982155a4ec7763e47fb16f0e

Fix: add unittest for RequestCounter & fix some naming issue

3dcd2cb

Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: I061386e56cb0c468fb1d687295f8502974423f21

Fix: add unittest for RequestCounter & fix some naming issue

d8ae316

Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: I08a605b32965077bb8d16407e6d908ba430b6dcd

Fix: add licence

9884359

Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: I49584b893d6e9b7db8b46a9264a92d9f53116a5a

Fix: fix naming issues

5def71c

Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: I82543a8c72c5d2bcf05579f7da6d03968c118314

Fix: unify env reader & add condition check on registering

b00722d

Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: I0a4c9128d40ccbe96537bc35576f8bc0b1903f2a

Fix: cr suggestions

0f3ce2f

Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: Ie2274aae8b9af5cf70b9fe7d35a347cb1f113b44

Fix: add GetSharedSyncPrefixHashTable to ensure there is only one Syn…

fd845ba

…cPrefixHashTable Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: I832cad5cd99b64f2a509eb51f9e87c61dfc6bf5f

Feat: add redis key rotation to correct the request count when gatewa…

f93a75b

…y happens to crash Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: I80ff6cce0e18ab7b8303f559c9b6de17947b5242

Feat: set ttl by lua, to avoid expire on every request

9b1f5ea

Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: I89dd144e36682b838499f85efd5b07c518c4da0d

penfree force-pushed the feature/prefix-cache-with-redis-reqcnt branch from 0a6b874 to 9b1f5ea Compare April 13, 2026 09:12

Conversation

penfree commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

✨ Feature: Redis-Based Request Counter for Multi-Instance Gateway Deployments

Testing

Breaking Changes

Related Issues

Deployment Notes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

penfree commented Apr 8, 2026

Uh oh!

varungup90 commented Apr 8, 2026

Uh oh!

varungup90 commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

penfree commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

varungup90 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

penfree Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

varungup90 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

penfree Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

penfree Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

penfree commented Apr 8, 2026 •

edited

Loading

varungup90 commented Apr 8, 2026 •

edited

Loading

penfree commented Apr 9, 2026 •

edited

Loading

penfree Apr 11, 2026 •

edited

Loading