feat: add stream idle timeout with fallover to next model by albe2669 · Pull Request #2169 · envoyproxy/ai-gateway

albe2669 · 2026-05-28T12:56:52Z

Description

Adds a new optional streamIdleTimeout field to AIGatewayRouteRule so callers can bound the time without receiving any upstream bytes on a streaming response.

Why

Today the only deadline on an LLM streaming call is Timeouts.Request, which bounds the whole response. If a model is slow to emit its first token, the client waits up to the overall budget before failing. We want to fail fast on idle/slow upstreams and (with retry configured) fall over to the next backend in the rule before any response headers reach the client.

How

The AI Gateway extension server sets route.retry_policy.per_try_idle_timeout on the generated xDS routes. In the PostTranslateModify hook it walks the final RouteConfigurations, identifies each AIGatewayRoute route by its rule index, looks up the AIGatewayRoute, and for any rule with streamIdleTimeout set it sets per_try_idle_timeout to that value. The timeout is merged into the route's existing retry policy, so retry config produced by a BackendTrafficPolicy (retryOn, numRetries) is preserved.

Behavior

A rule with streamIdleTimeout set has per_try_idle_timeout applied to its routes.
A rule without it is, of course, left untouched.
If the timer fires before the first response byte arrives, Envoy resets the upstream stream before any headers leave the gateway. A BackendTrafficPolicy with retryOn covering reset will transparently fall over to the next backend (model) in the rule.
If the timer fires mid-stream after bytes have already arrived, the stream is cut and the client receives a 504.

Testing method

Unit tests for the route mutation (timeout applied, existing retry policy preserved, no-timeout/non-forwarding/missing-route cases) plus the existing suite.
Created an example with two tiny applications where one never responds, set to the first priority in the backend list, and set the streamIdleTimeout to 5 seconds, it should then fallover to the other application. This ran successfully. I can add it as an example if that makes sense.
Verified end-to-end on a local cluster: Envoy's route config dump shows per_try_idle_timeout=5s on the rule (with retry_on/num_retries intact), and a streaming request fails over to the healthy backend after ~5s.

Notes

Generative AI was used to assist in writing this change.
The route-name prefix encodes Envoy Gateway's internal xDS naming convention. If that schema changes the lookup will no longer match the rule — but unlike the JSONPatch approach this is a normal code path that can log/observe the miss rather than failing silently.

Signed-off-by: albe2669 <albert@risenielsen.dk>

nacx · 2026-06-02T15:59:58Z

Can you provide a full example config that leverages this use case?

Also, using the EnvoyPatchPolicy is quite error-prone and could lead to issues when upgrading, etc. AIGW uses the extension server instead to patch configurations before sending them via xDS to the data plane. Could you update the proposal to use the extension server, as it would be more reliable?

codecov-commenter · 2026-06-02T16:10:52Z

Codecov Report

❌ Patch coverage is 96.61017% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.73%. Comparing base (67e4926) to head (6b361bd).

Files with missing lines	Patch %	Lines
internal/extensionserver/post_translate_modify.go	95.55%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2169      +/-   ##
==========================================
+ Coverage   84.71%   84.73%   +0.02%     
==========================================
  Files         144      144              
  Lines       21204    21263      +59     
==========================================
+ Hits        17962    18017      +55     
- Misses       2161     2163       +2     
- Partials     1081     1083       +2

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: albe2669 <albert@risenielsen.dk>

albe2669 · 2026-06-03T14:05:03Z

Can you provide a full example config that leverages this use case?

Also, using the EnvoyPatchPolicy is quite error-prone and could lead to issues when upgrading, etc. AIGW uses the extension server instead to patch configurations before sending them via xDS to the data plane. Could you update the proposal to use the extension server, as it would be more reliable?

Yeah of course. That makes much more sense, much cleaner solution. Should probably have looked a little more into how the system works. I pushed an update and updated the PR description.

In terms of the example config, I have been running this to test it: https://github.com/albe2669/ai-gateway/tree/example/examples/first_token_timeout. I can push that to this branch too if you think it's valuable to have.

For context, what we want to do is use virtual models such that if one model fails it falls over to the next one. But since inference takes so long, a 90-120 second timeout is set on the responses. Which, if the model never responds, means a request will take 90 seconds before falling over to the next model.
The logical next move is then to enable streaming in our client and try again if TTFT is over a certain value. BUT, if we do that, then we don't failover in the virtual model list as it's on the client, so we'll just keep trying the same model that may be unavailable, or just extremely slow.

This then solves that issue pretty cleanly as we will failover fast if the model didn't start responding after X seconds.

nacx

Thanks!

nacx · 2026-06-03T18:01:25Z

+		}
+		return fmt.Errorf("failed to get AIGatewayRoute %s/%s: %w", parts[1], parts[2], err)
+	}
+	if ruleIndex >= len(aigwRoute.Spec.Rules) {


When can this happen? Worth adding a comment.

Added in: b4bbeaf

Most of the other functions in the same file simply say the rule index is out of range.

nacx · 2026-06-03T18:03:39Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a new FirstTokenTimeout field to AIGatewayRouteRule (in both v1alpha1 and v1beta1 APIs) to configure the maximum wait time for the first response byte before Envoy resets the upstream stream. The extension server is updated to apply this timeout as per_try_idle_timeout on generated routes, accompanied by new helper methods, deepcopy updates, CRD manifests, and unit tests. The review feedback suggests optimizing the implementation by caching retrieved AIGatewayRoute objects during the translation pass to avoid redundant Kubernetes API calls, adding defensive guards for route name parsing, and updating the unit tests accordingly.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Signed-off-by: albe2669 <albert@risenielsen.dk>

albe2669 · 2026-06-04T09:02:07Z

What is your policy on AI reviews? Shall I resolve them, does it do that itself, or do you check and do it?

Signed-off-by: albe2669 <albert@risenielsen.dk>

albe2669 · 2026-06-10T09:27:34Z

FYI: i have renamed the parameter to the more appropriate streamIdleTimeout as it will also fail the request if the stream is idle for too long between tokens.

Also, sorry for the force push. I forgot the sign-off

nacx

Thanks!

Overall LGTM. One last thing left:

Can you provide a full example config that leverages this use case?

Can you add some example for this use case?

nacx · 2026-06-10T17:02:01Z

What is your policy on AI reviews? Shall I resolve them, does it do that itself, or do you check and do it?

AI reviews are just reviews, but they will not get auto-resolved. Thanks for addressing the comments!

Signed-off-by: albe2669 <albert@risenielsen.dk>

albe2669 · 2026-06-19T08:10:53Z

Thanks!

Overall LGTM. One last thing left:

Can you provide a full example config that leverages this use case?

Can you add some example for this use case?

Sorry, I've been away.

Examples added!

nacx · 2026-06-22T15:40:58Z

This is great. Thanks for the example!
I have a final ask. We want examples to always be fine and not break, and we usually have e2e tests for them. This feature adds a config to the API that should be e2e-tested as well. Could you add an end-to-end test for this example here? https://github.com/envoyproxy/ai-gateway/tree/main/tests/e2e
You'll see many e2e there use the example files to test the functionality and always keep the examples up to date.

albe2669 added 2 commits May 28, 2026 14:51

feat: add timeout to first token with rollover to new model

1d71c72

Signed-off-by: albe2669 <albert@risenielsen.dk>

fix: use per_try_idle_timeout instead of idleTimout

93ea692

Signed-off-by: albe2669 <albert@risenielsen.dk>

albe2669 force-pushed the main branch from ec5f86d to 93ea692 Compare May 28, 2026 13:36

albe2669 marked this pull request as ready for review May 29, 2026 06:55

albe2669 requested a review from a team as a code owner May 29, 2026 06:55

dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label May 29, 2026

Merge branch 'main' into main

6d959b2

fix: use extentionserver instead of the patch policy

e54711e

Signed-off-by: albe2669 <albert@risenielsen.dk>

nacx reviewed Jun 3, 2026

View reviewed changes

gemini-code-assist Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread internal/extensionserver/post_translate_modify.go Outdated

Comment thread internal/extensionserver/post_translate_modify.go Outdated

Comment thread internal/extensionserver/extensionserver_test.go

albe2669 added 4 commits June 4, 2026 10:52

feat: cache route retrievals

2be6b33

Signed-off-by: albe2669 <albert@risenielsen.dk>

docs: add a comment describing the ruleiIndex check

b4bbeaf

Signed-off-by: albe2669 <albert@risenielsen.dk>

fix: ensure part 1 and 2 arent empty

71e2a1e

Signed-off-by: albe2669 <albert@risenielsen.dk>

test: expand test coverage

558eb2f

Signed-off-by: albe2669 <albert@risenielsen.dk>

albe2669 requested a review from nacx June 4, 2026 09:01

fix: rename firstTokenTimeout to the more accurate streamIdleTimeout

918b8b6

Signed-off-by: albe2669 <albert@risenielsen.dk>

albe2669 force-pushed the main branch from b337344 to 918b8b6 Compare June 10, 2026 09:19

albe2669 changed the title ~~feat: add timeout to first token with rollover to new model~~ feat: add stream idle timeout with fallover to next model Jun 10, 2026

nacx reviewed Jun 10, 2026

View reviewed changes

albe2669 added 3 commits June 19, 2026 09:34

Merge remote-tracking branch 'origin/main'

70aa250

fix: use new New syntax in tests

90944dc

Signed-off-by: albe2669 <albert@risenielsen.dk>

docs: add streamIdleTimeout example

27c4aca

Signed-off-by: albe2669 <albert@risenielsen.dk>

Merge branch 'main' into main

6b361bd

Uh oh!

Conversation

albe2669 commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nacx commented Jun 2, 2026

Uh oh!

codecov-commenter commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

albe2669 commented Jun 3, 2026

Uh oh!

nacx left a comment

Choose a reason for hiding this comment

Uh oh!

nacx Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

albe2669 Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

nacx commented Jun 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

albe2669 commented Jun 4, 2026

Uh oh!

albe2669 commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nacx left a comment

Choose a reason for hiding this comment

Uh oh!

nacx commented Jun 10, 2026

Uh oh!

albe2669 commented Jun 19, 2026

Uh oh!

nacx commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

albe2669 commented May 28, 2026 •

edited

Loading

codecov-commenter commented Jun 2, 2026 •

edited

Loading

albe2669 commented Jun 10, 2026 •

edited

Loading