🚀 Feature Description and Motivation
AIBrix need the ability to combine multiple routing algorithms to balance different optimization goals (like load balancing, KV cache efficiency, and latency). Current AIBrix only supports a single routing algorithm, which limits our ability to fine-tune routing decisions for complex workloads.
Use Case
In large-scale clusters, we may need a multi-routing strategy, current problem is aibrix using only least-request routing ensures load balancing
Proposed Solution
The inspiration comes from multi-level sorting. I think we can implment the chained routing feature addresses this by allowing users to specify multiple comma-separated routing algorithms (e.g., least-request,least-kv-cache ). These algorithms are applied sequentially, with each algorithm narrowing down the candidate pod list until one pod remains or all algorithms are applied.
Maintaining full backward compatibility with existing routing configurations:
curl -H "routing-strategy: least-request,least-kv-cache" ...
Providing greater flexibility to tailor routing strategies to specific use cases
🚀 Feature Description and Motivation
AIBrix need the ability to combine multiple routing algorithms to balance different optimization goals (like load balancing, KV cache efficiency, and latency). Current AIBrix only supports a single routing algorithm, which limits our ability to fine-tune routing decisions for complex workloads.
Use Case
In large-scale clusters, we may need a multi-routing strategy, current problem is aibrix using only least-request routing ensures load balancing
Proposed Solution
The inspiration comes from multi-level sorting. I think we can implment the chained routing feature addresses this by allowing users to specify multiple comma-separated routing algorithms (e.g., least-request,least-kv-cache ). These algorithms are applied sequentially, with each algorithm narrowing down the candidate pod list until one pod remains or all algorithms are applied.
Maintaining full backward compatibility with existing routing configurations:
curl -H "routing-strategy: least-request,least-kv-cache" ...Providing greater flexibility to tailor routing strategies to specific use cases