feat: allow InferencePool as a backend reference in AIServiceBackend by isztldav · Pull Request #2166 · envoyproxy/ai-gateway

isztldav · 2026-05-28T08:47:45Z

Description

What

Relaxes the CEL validation rule on AIServiceBackend.spec.backendRef to accept
inference.networking.k8s.io/InferencePool alongside the existing
gateway.envoyproxy.io/Backend.

Controller changes:

validateBackendRef: checks that the referenced InferencePool exists and returns a clear
error (surfaced as a status condition) when it does not.
inferencePoolEventHandler: maps InferencePool change events to reconcile requests for
any AIServiceBackend that references the changed pool.
StartControllers: conditionally wires up the InferencePool watch on the
AIServiceBackend controller only when the InferencePool CRD is present in the cluster
(safe in environments that do not run GIE).

The updated CRD manifest (aigateway.envoyproxy.io_aiservicebackends.yaml) is regenerated to
match.

Why

This unblocks the composition of AI Gateway schema translation with KV-cache-aware endpoint
selection provided by the GIE EndpointPicker (EPP). Without this change, users must choose
one or the other per request path.

Testing

Unit tests for validateBackendRef (pool found / pool not found / non-InferencePool ref
no-op) and inferencePoolEventHandler (matching backends returned / non-matching namespace
skipped) added to internal/controller/ai_service_backend_test.go.
make precommit test passes.

Related Issues/PRs (if applicable)

Special notes for reviewers (if applicable)

The data-plane change that makes Anthropic requests work through EPP at runtime is in a
companion PR (feat: add EarlyTranslate to EndpointSpec so the EPP receives an OpenAI-format body for Anthropic requests #2167). This PR is self-contained on the control-plane side.
Relates to Bring back Kubernetes Service support in AIServiceBackend's ref #902.
AI assistant (Claude) was used for drafting and reviewing code; all logic has been reviewed
and is understood by the author.

Signed-off-by: David Isztl <isztl.david@gmail.com>

…ndler Cover the three cases of validateBackendRef (non-InferencePool ref is a no-op, pool found returns nil, pool not found returns a clear error) and the two cases of inferencePoolEventHandler (matching backend in the same namespace, non-matching namespace returns empty). Signed-off-by: David Isztl <isztl.david@gmail.com>

dosubot · 2026-05-28T08:50:47Z

Related Knowledge

2 documents with suggested updates are ready for review.

Envoy's Space

connect-providers `/ai-gateway/blob/main/site/docs/capabilities/llm-integrations/connect-providers.md` — ⏳ Awaiting Merge

resources `/ai-gateway/blob/main/site/docs/concepts/resources.md` — ⏳ Awaiting Merge

^{How did I do? Any feedback?}

johnugeorge · 2026-05-31T18:50:11Z

/gemini review

gemini-code-assist

Code Review

This pull request adds support for referencing Gateway API Inference Extension InferencePool resources as backends in AIServiceBackend, updating CRD validation, controller validation, and event handling. Feedback highlights a namespace resolution bug in the inferencePoolEventHandler that could lead to missed cross-namespace reconciliations or false positives.

inferencePoolEventHandler restricted its AIServiceBackend lookup to the pool's namespace and never compared the backend's referenced namespace against the pool, causing missed cross-namespace reconciles and false positives. List backends across all namespaces and match the resolved reference namespace against the pool's namespace. Signed-off-by: David Isztl <isztl.david@gmail.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

nacx · 2026-06-02T16:12:56Z

IIUC, there will now be different ways of leveraging the GAIE: by targeting an inference pool in an AIGatwayRoute, or by targeting the inference pool on an AIServiceBackend.
Can you elaborate on when to use each approach, what each approach enables, and the caveats and gotchas of each?

codecov-commenter · 2026-06-02T16:20:35Z

Codecov Report

❌ Patch coverage is 74.35897% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.40%. Comparing base (3db0895) to head (f897188).
⚠️ Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
internal/controller/ai_service_backend.go	74.35%	6 Missing and 4 partials ⚠️

❌ Your patch status has failed because the patch coverage (74.35%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2166      +/-   ##
==========================================
- Coverage   84.44%   84.40%   -0.05%     
==========================================
  Files         134      134              
  Lines       19142    19201      +59     
==========================================
+ Hits        16165    16206      +41     
- Misses       1992     2004      +12     
- Partials      985      991       +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: David Isztl <isztl.david@gmail.com>

isztldav · 2026-06-02T20:37:06Z

IIUC, there will now be different ways of leveraging the GAIE: by targeting an inference pool in an AIGatwayRoute, or by targeting the inference pool on an AIServiceBackend. Can you elaborate on when to use each approach, what each approach enables, and the caveats and gotchas of each?

Thanks for pushing on this, digging in actually changed my understanding, so let me lay out the reasoning:

Route-level — AIGatewayRoute → InferencePool (already supported on main). The gateway hardcodes the pool's schema to OpenAI and runs the upstream translation ext_proc against it.
Backend-level — AIGatewayRoute → AIServiceBackend → InferencePool (this PR). The pool is wrapped in an AIServiceBackend, which carries an explicit schema plus per-backend mutations/overrides and multi-backend composition.

I only just learned route-level AIGatewayRoute → InferencePool already exists. With that in mind, this PR is arguably not strictly required for the basic case, for an OpenAI-speaking pool, route-level gives you the same translation + EPP selection, since the control-plane builds an identical backend entry (Schema = OpenAI) either way. The backend-level makes sense to keep when you want to explicitly declare schema: OpenAI (more transparent than the route-level "assume OpenAI" hardcode).

The actual reason I ended up here, is #2167. That one is essential regardless of which approach you pick. The EPP runs in the listener filter chain before the upstream ext_proc does schema translation, and it can only parse OpenAI bodies. So an Anthropic request to an InferencePool fails at the EPP before translation ever happens:

curl http://ai-gateway/anthropic/v1/messages \
    -H "Content-Type: application/json" -H "x-api-key: test-key" \
    -H "anthropic-version: 2023-06-01" \
    -d '{"model":"meta-llama/Llama-3.1-8B-Instruct","max_tokens":100,"messages":[{"role":"user","content":"Hello"}]}'

returns

inference gateway: BadRequest - failed to extract request data: invalid completions request: must have prompt field

#2167 early-translates the body to OpenAI at the router-filter phase (before the EPP) so the EPP can parse it, while the upstream ext_proc still re-translates from the stored original body. After the fix, the same request succeeds:

{"id":"chatcmpl-...","type":"message","role":"assistant",
"content":[{"type":"text","text":"Hello. How can I assist you today?"}],
"model":"meta-llama/Llama-3.1-8B-Instruct","stop_reason":"end_turn",
"usage":{"input_tokens":36,"output_tokens":10}}

And I've confirmed #2167 fixes this for both AIGatewayRoute → InferencePool and AIGatewayRoute → AIServiceBackend → InferencePool, since the early-translate is endpoint-driven (keyed on /v1/messages), not on how the pool is referenced.

So not sure if this PR is worth keeping. Either way #2167 is the piece that unblocks Anthropic→EPP.

isztldav added 2 commits May 28, 2026 09:51

added support for InferencePool

18928cd

Signed-off-by: David Isztl <isztl.david@gmail.com>

isztldav requested a review from a team as a code owner May 28, 2026 08:47

dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label May 28, 2026

isztldav mentioned this pull request May 28, 2026

feat: add EarlyTranslate to EndpointSpec so the EPP receives an OpenAI-format body for Anthropic requests #2167

Open

isztldav force-pushed the pr/inferencepool-backendref branch from f4dbf71 to ab329c8 Compare May 28, 2026 08:59

Merge remote-tracking branch 'github/main' into temp-pr1

5224533

isztldav force-pushed the pr/inferencepool-backendref branch from ab329c8 to 5224533 Compare May 28, 2026 09:02

gemini-code-assist Bot reviewed May 31, 2026

View reviewed changes

Comment thread internal/controller/ai_service_backend.go

isztldav force-pushed the pr/inferencepool-backendref branch from 9a36ecd to 1d0d03c Compare May 31, 2026 19:26

isztldav force-pushed the pr/inferencepool-backendref branch from 1d0d03c to f897188 Compare May 31, 2026 19:28

isztldav added 2 commits June 2, 2026 21:04

Merge branch 'envoyproxy:main' into pr/inferencepool-backendref

14b93ae

changes from make precommit

06af4a8

Signed-off-by: David Isztl <isztl.david@gmail.com>

isztldav added 3 commits June 12, 2026 13:49

Merge branch 'envoyproxy:main' into pr/inferencepool-backendref

9e8f216

Merge branch 'envoyproxy:main' into pr/inferencepool-backendref

6425e9d

Merge branch 'envoyproxy:main' into pr/inferencepool-backendref

d2076ec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: allow InferencePool as a backend reference in AIServiceBackend#2166

feat: allow InferencePool as a backend reference in AIServiceBackend#2166
isztldav wants to merge 9 commits into
envoyproxy:mainfrom
isztldav:pr/inferencepool-backendref

isztldav commented May 28, 2026 •

edited

Loading

Uh oh!

dosubot Bot commented May 28, 2026

Uh oh!

johnugeorge commented May 31, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

nacx commented Jun 2, 2026

Uh oh!

codecov-commenter commented Jun 2, 2026

Uh oh!

isztldav commented Jun 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

isztldav commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Testing

Uh oh!

dosubot Bot commented May 28, 2026

connect-providers /ai-gateway/blob/main/site/docs/capabilities/llm-integrations/connect-providers.md — ⏳ Awaiting Merge

resources /ai-gateway/blob/main/site/docs/concepts/resources.md — ⏳ Awaiting Merge

Uh oh!

johnugeorge commented May 31, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

nacx commented Jun 2, 2026

Uh oh!

codecov-commenter commented Jun 2, 2026

Codecov Report

Uh oh!

isztldav commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

isztldav commented May 28, 2026 •

edited

Loading

connect-providers `/ai-gateway/blob/main/site/docs/capabilities/llm-integrations/connect-providers.md` — ⏳ Awaiting Merge

resources `/ai-gateway/blob/main/site/docs/concepts/resources.md` — ⏳ Awaiting Merge

isztldav commented Jun 2, 2026 •

edited

Loading