-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
Description:
Most of my orgs infra resides in on-premise datacenters with many racks, each of which is treated as an individual locality zone in the context of Envoy. For our large services we use least-request + zone-aware routing to balance traffic while preferring the local rack. This works well when incoming load is roughly uniform between racks/zones, but when it isn't the upstream cluster develops hotspots.
The cause here is that zone-aware routing decides what percentage of traffic to keep local vs remote based on healthy host counts (default) or host weights. It has no visibility into actual backend load. Least-request helps within a zone but can't redistribute across zones.
Proposed solution: load-aware locality_basis mode for zone-aware routing
Add an ORCA_UTILIZATION (open to better names) locality_basis mode to ZoneAwareLbConfig that uses ORCA load reports to drive the local-vs-remote split:
- Aggregate per-zone average utilization from ORCA reports on each host.
- Compare local zone utilization to the global average.
- If the local zone is overloaded, spill excess traffic to underloaded remote zones proportional to their available headroom (1 - utilization).
- (optional) Apply EMA dampening to smooth out ORCA data and prevent oscillation.
- Support a utilization variance threshold knob — when all zones are within threshold of the global average, snap to local-only routing to avoid unnecessary cross-zone traffic.
This approach converges toward utilization equilibrium across zones while also preserving local preference as much as possible.
Prerequisites / related work:
This mode requires an LB policy that populates ORCA data and accepts localityLbConfig or zoneAwareLbConfig directly. I've started on this in #43623 by:
- Extracting code from client-side weighted round robin into an OrcaWeightManager class that handles ORCA report processing, per-host weight calculation (
weight = qps / utilization), periodic updates, and median-weight fallback for hosts without ORCA data. This is nearly identical to the existing code and is intended to be reusable across ORCA-aware LB policies. - Rewriting
wrr_localityas a standalone ThreadAwareLoadBalancer that accepts full localityLbConfig and creates an OrcaWeightManager, rather than wrapping client-side weighted round robin.
Once the above is added I have a prototype ready for the locality_basis mode and will open a follow-up PR.
Why a new locality basis mode vs HEALTHY_HOSTS_WEIGHT?
The OrcaWeightManager extraction + localityLbConfig integration means that HEALTHY_HOSTS_WEIGHT locality basis would use ORCA-derived host weights instead for the zone split but it solves a different problem. The ORCA weight formula is weight = qps / utilization, which measures host capacity — how much throughput a host can deliver per unit of utilization. For well-behaved backends where utilization scales linearly with QPS, this simplifies to a constant regardless of current load. A host at 80% utilization and a host at 30% utilization with the same hardware get the same weight.
This means HEALTHY_HOSTS_WEIGHT is useful for capacity asymmetry (heterogeneous hardware, partially degraded hosts) but is blind to load asymmetry (uneven incoming traffic causing hotspots), which is the primary problem motivating this proposal. ORCA_UTILIZATION addresses this by routing based on available headroom (1 - utilization) directly rather than capacity-derived weights.
Alternative considered: new ORCA-based LB policy with locality selection
Rather than extending zone-aware routing, a dedicated ORCA-based policy could handle locality selection internally — potentially via a new locality_picking_policy alongside endpoint_picking_policy. This would give full control over the locality selection algorithm but wouldn't benefit from existing zone-aware routing infrastructure (residual routing, panic thresholds, etc.) and would be a larger design surface.
I believe extending zone-aware routing is the better path since it builds on proven mechanics and may be easier for users to reason about, but I'd like additional input on whether a separate locality selection abstraction has broader value.
Relevant Links: