Skip to content

feat: allow InferencePool as a backend reference in AIServiceBackend#2166

Open
isztldav wants to merge 9 commits into
envoyproxy:mainfrom
isztldav:pr/inferencepool-backendref
Open

feat: allow InferencePool as a backend reference in AIServiceBackend#2166
isztldav wants to merge 9 commits into
envoyproxy:mainfrom
isztldav:pr/inferencepool-backendref

Conversation

@isztldav

@isztldav isztldav commented May 28, 2026

Copy link
Copy Markdown

Description

What

Relaxes the CEL validation rule on AIServiceBackend.spec.backendRef to accept
inference.networking.k8s.io/InferencePool alongside the existing
gateway.envoyproxy.io/Backend.

Controller changes:

  • validateBackendRef: checks that the referenced InferencePool exists and returns a clear
    error (surfaced as a status condition) when it does not.
  • inferencePoolEventHandler: maps InferencePool change events to reconcile requests for
    any AIServiceBackend that references the changed pool.
  • StartControllers: conditionally wires up the InferencePool watch on the
    AIServiceBackend controller only when the InferencePool CRD is present in the cluster
    (safe in environments that do not run GIE).

The updated CRD manifest (aigateway.envoyproxy.io_aiservicebackends.yaml) is regenerated to
match.

Why

This unblocks the composition of AI Gateway schema translation with KV-cache-aware endpoint
selection provided by the GIE EndpointPicker (EPP). Without this change, users must choose
one or the other per request path.

Testing

  • Unit tests for validateBackendRef (pool found / pool not found / non-InferencePool ref
    no-op) and inferencePoolEventHandler (matching backends returned / non-matching namespace
    skipped) added to internal/controller/ai_service_backend_test.go.
  • make precommit test passes.

Related Issues/PRs (if applicable)

Special notes for reviewers (if applicable)

isztldav added 2 commits May 28, 2026 09:51
Signed-off-by: David Isztl <isztl.david@gmail.com>
…ndler

Cover the three cases of validateBackendRef (non-InferencePool ref is a
no-op, pool found returns nil, pool not found returns a clear error) and
the two cases of inferencePoolEventHandler (matching backend in the same
namespace, non-matching namespace returns empty).

Signed-off-by: David Isztl <isztl.david@gmail.com>
@isztldav isztldav requested a review from a team as a code owner May 28, 2026 08:47
@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label May 28, 2026
@dosubot

dosubot Bot commented May 28, 2026

Copy link
Copy Markdown

Related Knowledge

2 documents with suggested updates are ready for review.

Envoy's Space

connect-providers /ai-gateway/blob/main/site/docs/capabilities/llm-integrations/connect-providers.md — ⏳ Awaiting Merge
resources /ai-gateway/blob/main/site/docs/concepts/resources.md — ⏳ Awaiting Merge

How did I do? Any feedback?  Join Discord

@isztldav isztldav force-pushed the pr/inferencepool-backendref branch from ab329c8 to 5224533 Compare May 28, 2026 09:02
@johnugeorge

Copy link
Copy Markdown
Contributor

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for referencing Gateway API Inference Extension InferencePool resources as backends in AIServiceBackend, updating CRD validation, controller validation, and event handling. Feedback highlights a namespace resolution bug in the inferencePoolEventHandler that could lead to missed cross-namespace reconciliations or false positives.

Comment thread internal/controller/ai_service_backend.go
@isztldav isztldav force-pushed the pr/inferencepool-backendref branch from 9a36ecd to 1d0d03c Compare May 31, 2026 19:26
inferencePoolEventHandler restricted its AIServiceBackend lookup to the
pool's namespace and never compared the backend's referenced namespace
against the pool, causing missed cross-namespace reconciles and false
positives. List backends across all namespaces and match the resolved
reference namespace against the pool's namespace.

Signed-off-by: David Isztl <isztl.david@gmail.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@isztldav isztldav force-pushed the pr/inferencepool-backendref branch from 1d0d03c to f897188 Compare May 31, 2026 19:28
@nacx

nacx commented Jun 2, 2026

Copy link
Copy Markdown
Member

IIUC, there will now be different ways of leveraging the GAIE: by targeting an inference pool in an AIGatwayRoute, or by targeting the inference pool on an AIServiceBackend.
Can you elaborate on when to use each approach, what each approach enables, and the caveats and gotchas of each?

@codecov-commenter

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 74.35897% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.40%. Comparing base (3db0895) to head (f897188).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
internal/controller/ai_service_backend.go 74.35% 6 Missing and 4 partials ⚠️

❌ Your patch status has failed because the patch coverage (74.35%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2166      +/-   ##
==========================================
- Coverage   84.44%   84.40%   -0.05%     
==========================================
  Files         134      134              
  Lines       19142    19201      +59     
==========================================
+ Hits        16165    16206      +41     
- Misses       1992     2004      +12     
- Partials      985      991       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@isztldav

isztldav commented Jun 2, 2026

Copy link
Copy Markdown
Author

IIUC, there will now be different ways of leveraging the GAIE: by targeting an inference pool in an AIGatwayRoute, or by targeting the inference pool on an AIServiceBackend. Can you elaborate on when to use each approach, what each approach enables, and the caveats and gotchas of each?

Thanks for pushing on this, digging in actually changed my understanding, so let me lay out the reasoning:

  1. Route-level — AIGatewayRoute → InferencePool (already supported on main). The gateway hardcodes the pool's schema to OpenAI and runs the upstream translation ext_proc against it.
  2. Backend-level — AIGatewayRoute → AIServiceBackend → InferencePool (this PR). The pool is wrapped in an AIServiceBackend, which carries an explicit schema plus per-backend mutations/overrides and multi-backend composition.

I only just learned route-level AIGatewayRoute → InferencePool already exists. With that in mind, this PR is arguably not strictly required for the basic case, for an OpenAI-speaking pool, route-level gives you the same translation + EPP selection, since the control-plane builds an identical backend entry (Schema = OpenAI) either way. The backend-level makes sense to keep when you want to explicitly declare schema: OpenAI (more transparent than the route-level "assume OpenAI" hardcode).

The actual reason I ended up here, is #2167. That one is essential regardless of which approach you pick. The EPP runs in the listener filter chain before the upstream ext_proc does schema translation, and it can only parse OpenAI bodies. So an Anthropic request to an InferencePool fails at the EPP before translation ever happens:

curl http://ai-gateway/anthropic/v1/messages \
    -H "Content-Type: application/json" -H "x-api-key: test-key" \
    -H "anthropic-version: 2023-06-01" \
    -d '{"model":"meta-llama/Llama-3.1-8B-Instruct","max_tokens":100,"messages":[{"role":"user","content":"Hello"}]}'

returns

inference gateway: BadRequest - failed to extract request data: invalid completions request: must have prompt field

#2167 early-translates the body to OpenAI at the router-filter phase (before the EPP) so the EPP can parse it, while the upstream ext_proc still re-translates from the stored original body. After the fix, the same request succeeds:

{"id":"chatcmpl-...","type":"message","role":"assistant",
"content":[{"type":"text","text":"Hello. How can I assist you today?"}],
"model":"meta-llama/Llama-3.1-8B-Instruct","stop_reason":"end_turn",
"usage":{"input_tokens":36,"output_tokens":10}}

And I've confirmed #2167 fixes this for both AIGatewayRoute → InferencePool and AIGatewayRoute → AIServiceBackend → InferencePool, since the early-translate is endpoint-driven (keyed on /v1/messages), not on how the pool is referenced.

So not sure if this PR is worth keeping. Either way #2167 is the piece that unblocks Anthropic→EPP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants