Feature/prefix cache with redis reqcnt#2089
Feature/prefix cache with redis reqcnt#2089penfree wants to merge 16 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a RequestCounter interface and a Redis-based implementation to track pod request counts for load balancing in the prefix cache router. It also refactors the routing logic to use a shared sync indexer and improves pod identification by including namespaces in keys. The review feedback identifies several critical bugs: incorrect key handling in both the KV sync and standard routing paths which would break pod selection, an interface violation in the local request counter regarding namespace support, and potential Redis memory leaks due to missing TTLs. Additionally, logic was flagged that could lead to counter drift if request completion is tracked without a corresponding successful initiation.
|
@Jeffwan hi, please review this pr |
|
@penfree Thank you for the PR, is it possible to split the PR into two. Remove the redis counter into different PR, also I am started a PR to sync in-memory state across gateway instances (#1989) -- in that PR, it syncs in-memory prefix cache stats, but can be extended to share request counter as well. |
|
Summary Critical Issues
GetRequestCounts() now returns keys in namespace/name format, but the loop that builds podRequestCountWithKeys looks up podRequestCount[pod.Name] (bare name). This silently fails — every lookup misses, podRequestCountWithKeys is always empty, and the router falls back to least-request selection with no valid input.
The function is defined but never called. This suggests the refactoring was left incomplete. Go won't fail to compile on an unused function, but the routing code that should call it likely has the bug described in #1.
When c.GetSyncPrefixIndexer() returns nil, the code logs a warning and skips creating kvSyncRouter, but prefixCacheIndexerStatus metrics are never set. The fallback to local-indexer behavior goes unrecorded. High Priority podServerKey was moved to utils.go and its format changed from name_port → namespace/name/port. Existing in-flight Redis keys under the old format will be silently ignored on deploy, causing a transient load spike. This needs deployment notes.
Updated on every GetRequestCountsWithPort call but never pruned. Long-running instances routing many different models will leak memory. Medium Priority The Lua decrement script is a string literal inside DoneRequestCount(). In high-traffic environments, define it as a package-level redis.NewScript(...) and use EvalSha for Redis script caching.
A single ongoing request continuously refreshes the TTL. Consider ExpireNX (set only if no TTL exists) or ExpireGT to avoid preventing cleanup of active keys.
GetSyncPrefixIndexer() *syncindexer.SyncPrefixHashTable tightly couples the interface to the implementation. An interface type would enable test injection and better separation of concerns.
github.com/go-redis/redismock/v9 is a test-only library but placed in the main require block, adding it as a production binary dependency. |
|
@varungup90 Thanks for your patience。
i haven't found any code that used bare pod name to look up podRequestCount (.venv) ➜ pkg git:(feature/prefix-cache-with-redis-reqcnt) ack podRequestCount | grep Name
plugins/gateway/algorithms/pd_disaggregation_benchmark_test.go:122: podRequestCounts[pod.Name] = requestCount
plugins/gateway/algorithms/request_counter.go:137: podRequestCount[keyName] = count
plugins/gateway/algorithms/prefix_cache.go:719: // For KV sync router, we need podRequestCount to use pod keys (namespace/podName)
plugins/gateway/algorithms/pd_disaggregation.go:425: podRequestCounts[pod.Name] = requestCount
plugins/gateway/algorithms/pd_disaggregation.go:483: score := podRequestCounts[pod.Name] / math.Max(drainRate.GetSimpleValue(), defaultDrainRateEpsilon)
plugins/gateway/algorithms/pd_disaggregation.go:618: reqCnt := float64(podRequestCount[pod.Name])
plugins/gateway/algorithms/pd_disaggregation.go:650: normalizedRunningReqs := podRequestCounts[pod.Name] / maxRequestCount
plugins/gateway/algorithms/pd_disaggregation.go:667: "running_reqs", podRequestCounts[pod.Name], "max_running_reqs", maxRequestCount,
plugins/gateway/algorithms/pd_disaggregation.go:1168: countInterface, _ := t.podRequestCounts.LoadOrStore(podName, &atomic.Int32{})
plugins/gateway/algorithms/pd_disaggregation.go:1191: countInterface, exists := t.podRequestCounts.Load(podName)
plugins/gateway/algorithms/pd_disaggregation.go:1215: countInterface, exists := t.podRequestCounts.Load(pod.Name)
plugins/gateway/algorithms/pd_disaggregation.go:1242: countInterface, _ := t.podRequestCounts.LoadOrStore(podName, &atomic.Int32{})
plugins/gateway/algorithms/pd_disaggregation.go:1257: countInterface, exists := t.podRequestCounts.Load(podName)
plugins/gateway/algorithms/pd_disaggregation.go:1281: countInterface, exists := t.podRequestCounts.Load(podName)
plugins/gateway/algorithms/least_request.go:186: podRequestCount[pod.Name] = int(runningReq.GetSimpleValue())
plugins/gateway/algorithms/least_request.go:217: podRequestCount[keyName] = count
removed
Fixed
Since redis key in power of two will rotate every hour, i think this is not a problem
the max size of this map is the amount of model, i don't think there can have so many models that cause a memory issue, this every entry has only a string and a int64. however, pruning it needs many code
Fixed
i've added key rotation, and set ttl in lua only when the key does not exist
**like PrefixHashTable, a GetSharedSyncPrefixHashTable is added to get singleton of SyncPrefixHashTable, GetSyncPrefixIndexer is removed from cache_api interface.
seems this is not an issue, the redismock is only used in test, so it's not built into the production binary |
| tokenizerPool TokenizerPoolInterface // Add TokenizerPool reference | ||
| syncIndexer *syncindexer.SyncPrefixHashTable // Use concrete type | ||
| metricsEnabled bool | ||
| requestCounter RequestCounter // Request counter for load balancing |
There was a problem hiding this comment.
Local counter is not needed. This local counter is only incrementing requests for prefix-cache, compare to RealtimeNumRequestsRunning keep track of requests across all routing policies.
RealtimeNumRequestsRunning: is calculated via gateway and it is not fetched from the pod.
metricName = metrics.RealtimeNumRequestsRunning
if val, err := cache.GetMetricValueByPod(pod.Name, pod.Namespace, metricName); err == nil && val != nil {
count = int(val.GetSimpleValue())
}
There was a problem hiding this comment.
@varungup90 i don't think so, localCounter implements the same interface as redisCounter, so kvSyncPrefixCacheRouter can use either of them as user's need.
When only one gateway instance is deployed, they don't need redisCounter with extra latency cost. localCounter can be a dropin replacement in such cases, without using many local util functions.
| kvSyncRouter *kvSyncPrefixCacheRouter | ||
|
|
||
| // Request counter for load balancing | ||
| requestCounter RequestCounter |
There was a problem hiding this comment.
Lets simplify the PR, split the request counter into a different PR. This one to be more focussed on fixing bug with kv event sync.
There was a problem hiding this comment.
@varungup90 already split the bugfix to #2096 , i'll rebase this pr to main branch after it's merged
Signed-off-by: penfree <pengfei.qiu@gmail.com> Change-Id: I77d872965fcaebf0c455d93646a4b29e34f1d8d0
Signed-off-by: penfree <pengfei.qiu@gmail.com> Change-Id: Ifa2c860c0f41021dcb02c84aacd8829393e1b390
Signed-off-by: penfree <pengfei.qiu@gmail.com> Change-Id: I0e83f041ba96a471c1fb6931f89c717863ae6b67
Signed-off-by: penfree <pengfei.qiu@gmail.com> Change-Id: I42451439b67ebb96d01262a0f2dfe5a20240aad4
Signed-off-by: penfree <pengfei.qiu@gmail.com> Change-Id: I1e38be1fe495ab422fb7dd923dcbdf97b208e58f
Signed-off-by: penfree <pengfei.qiu@gmail.com> Change-Id: I25205a00bef2aa181d0816c7d9e3e2eb11233e79
Signed-off-by: penfree <pengfei.qiu@gmail.com> Change-Id: I459b8d61300755f2982155a4ec7763e47fb16f0e
Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: I061386e56cb0c468fb1d687295f8502974423f21
Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: I08a605b32965077bb8d16407e6d908ba430b6dcd
Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: I49584b893d6e9b7db8b46a9264a92d9f53116a5a
Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: I82543a8c72c5d2bcf05579f7da6d03968c118314
Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: I0a4c9128d40ccbe96537bc35576f8bc0b1903f2a
Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: Ie2274aae8b9af5cf70b9fe7d35a347cb1f113b44
…cPrefixHashTable Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: I832cad5cd99b64f2a509eb51f9e87c61dfc6bf5f
…y happens to crash Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: I80ff6cce0e18ab7b8303f559c9b6de17947b5242
Signed-off-by: penfree <penfree.qiu@gmail.com> Change-Id: I89dd144e36682b838499f85efd5b07c518c4da0d
0a6b874 to
9b1f5ea
Compare
Summary
This PR includes critical bugfixes for KV sync prefix cache routing and adds
Redis-based request counting support for multi-instance gateway deployments.
Changes
✨ Feature: Redis-Based Request Counter for Multi-Instance Gateway Deployments
Motivation:
When multiple AIBrix gateway instances are deployed, each instance only tracks local
request counts. This causes incorrect load imbalance detection since no single
instance has the complete view of pod request distribution.
Implementation:
RequestCounterinterface with two implementations:localRequestCounter: Uses local metrics (existing behavior)redisRequestCounter: Uses Redis hash to track distributed request countsmap[string]intinstead ofcache.CacheAIBRIX_PREFIX_CACHE_USE_REDIS_REQCNT=trueenablesRedis-based counting
Key Features:
aibrix:prefix-cache-reqcnt:{model_name}(hashmap: pod_key →count)
{model_name}for same-slot operationsFiles Changed:
pkg/plugins/gateway/algorithms/request_counter.go(new file): RequestCounterinterface and implementations
pkg/plugins/gateway/algorithms/prefix_cache.go: Integration with request counterpkg/plugins/gateway/algorithms/power_of_two.go: UpdatedpodServerKey.String()to include namespace
Testing
requestCounterBreaking Changes
None. Changes are backward compatible:
Related Issues
Closes: [Issue number if applicable]
Deployment Notes
To enable Redis-based request counting:
AIBRIX_PREFIX_CACHE_USE_REDIS_REQCNT=true