feat: allow InferencePool as a backend reference in AIServiceBackend#2166
feat: allow InferencePool as a backend reference in AIServiceBackend#2166isztldav wants to merge 9 commits into
Conversation
Signed-off-by: David Isztl <isztl.david@gmail.com>
…ndler Cover the three cases of validateBackendRef (non-InferencePool ref is a no-op, pool found returns nil, pool not found returns a clear error) and the two cases of inferencePoolEventHandler (matching backend in the same namespace, non-matching namespace returns empty). Signed-off-by: David Isztl <isztl.david@gmail.com>
|
Related Knowledge 2 documents with suggested updates are ready for review. Envoy's Space connect-providers
|
f4dbf71 to
ab329c8
Compare
ab329c8 to
5224533
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds support for referencing Gateway API Inference Extension InferencePool resources as backends in AIServiceBackend, updating CRD validation, controller validation, and event handling. Feedback highlights a namespace resolution bug in the inferencePoolEventHandler that could lead to missed cross-namespace reconciliations or false positives.
9a36ecd to
1d0d03c
Compare
inferencePoolEventHandler restricted its AIServiceBackend lookup to the pool's namespace and never compared the backend's referenced namespace against the pool, causing missed cross-namespace reconciles and false positives. List backends across all namespaces and match the resolved reference namespace against the pool's namespace. Signed-off-by: David Isztl <isztl.david@gmail.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
1d0d03c to
f897188
Compare
|
IIUC, there will now be different ways of leveraging the GAIE: by targeting an inference pool in an |
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (74.35%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #2166 +/- ##
==========================================
- Coverage 84.44% 84.40% -0.05%
==========================================
Files 134 134
Lines 19142 19201 +59
==========================================
+ Hits 16165 16206 +41
- Misses 1992 2004 +12
- Partials 985 991 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: David Isztl <isztl.david@gmail.com>
Thanks for pushing on this, digging in actually changed my understanding, so let me lay out the reasoning:
I only just learned route-level The actual reason I ended up here, is #2167. That one is essential regardless of which approach you pick. The EPP runs in the listener filter chain before the upstream ext_proc does schema translation, and it can only parse OpenAI bodies. So an Anthropic request to an InferencePool fails at the EPP before translation ever happens: returns
#2167 early-translates the body to OpenAI at the router-filter phase (before the EPP) so the EPP can parse it, while the upstream ext_proc still re-translates from the stored original body. After the fix, the same request succeeds: And I've confirmed #2167 fixes this for both So not sure if this PR is worth keeping. Either way #2167 is the piece that unblocks |
Description
What
Relaxes the CEL validation rule on
AIServiceBackend.spec.backendRefto acceptinference.networking.k8s.io/InferencePoolalongside the existinggateway.envoyproxy.io/Backend.Controller changes:
validateBackendRef: checks that the referencedInferencePoolexists and returns a clearerror (surfaced as a status condition) when it does not.
inferencePoolEventHandler: mapsInferencePoolchange events to reconcile requests forany
AIServiceBackendthat references the changed pool.StartControllers: conditionally wires up theInferencePoolwatch on theAIServiceBackendcontroller only when theInferencePoolCRD is present in the cluster(safe in environments that do not run GIE).
The updated CRD manifest (
aigateway.envoyproxy.io_aiservicebackends.yaml) is regenerated tomatch.
Why
This unblocks the composition of AI Gateway schema translation with KV-cache-aware endpoint
selection provided by the GIE EndpointPicker (EPP). Without this change, users must choose
one or the other per request path.
Testing
validateBackendRef(pool found / pool not found / non-InferencePool refno-op) and
inferencePoolEventHandler(matching backends returned / non-matching namespaceskipped) added to
internal/controller/ai_service_backend_test.go.make precommit testpasses.Related Issues/PRs (if applicable)
Special notes for reviewers (if applicable)
companion PR (feat: add EarlyTranslate to EndpointSpec so the EPP receives an OpenAI-format body for Anthropic requests #2167). This PR is self-contained on the control-plane side.
and is understood by the author.