Weekly Update — May 19 – June 1, 2026 #2178
missBerg
announced in
Weekly Updates
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Two weeks of post-v0.6 polish — streaming/SSE robustness on the Responses and MCP paths, translation-coverage edges (audio/video content, Anthropic beta headers, typeless assistant turns, custom Anthropic prefix), and a couple of stability fixes against ext-proc panics. Thanks to everyone who shipped, reviewed, and weighed in.
✨ What's new
Provider & translation coverage
audio_urlandvideo_urlcontent types in OpenAI schema — #2136 by @cjackal. OpenAI-shape clients can now send multimodal inputs to backends that consume audio/video (phi-4-mm, qwen3.5, and other OpenAI-compatible multimodal servers). Schema-only change, no API churn. Fixes #2035.anthropic-betaheader mapped on the AWSAnthropic backend — #2148 by @CodePrometheus. Bedrock takes Anthropic beta opt-ins in the body, not the header, so the translator now liftsanthropic-beta: context-1m-2025-08-07intoanthropic_beta: ["context-1m-2025-08-07"](comma-separated values are split and trimmed). Fixes #2147.prefixfield honored on the Anthropic backend — #2108 by @ajac-zero.VersionedAPISchema.prefixhad been silently dropped for Anthropic (the request path was hardcoded to/v1/messages); it's now threaded throughschemaToFilterAPI, the endpoint spec, and theanthropicToAnthropicTranslatorso operators can target Anthropic-compatible backends served under a custom prefix like/gateway/v1/messages. Default behavior is unchanged —prefixdefaults to"v1"when unset.AWSAnthropicandGCPAnthropicstill ignore the field./v1/responses— #2172 by @dzibma. Multi-turn Responses inputs from clients like OpenCode (assistant turns serialized without an explicittype: "message") now parse as output messages instead of failing theEasyInputMessageParamunmarshal onoutput_text.Routing
/v1/modelslisting) to a specific hostname likexx.api.aieg.com, so different host-fronted model groups stop bleeding into each other's listings. Lands the rebased version of the long-running #1987; discussed in #1646.Streaming & MCP
STREAMEDbody mode ext_proc can split a single SSE event acrossResponseBodycalls; the translator now holds incomplete bytes and parses only complete\n\n-terminated events, so a splitresponse.completedno longer silently drops token usage. Fixes #2162.parseEventwas matching ondata:/event:with a mandatory space and silently dropping lines from backends that emitdata:{...}per spec (e.g. Spring Boot'sSseEmitter) —tools/list/prompts/listaggregations were coming back empty as a result. Both forms now parse the same way. Fixes #2149.Bug fixes & stability
Output— #2157 by @siddharth1036. A 200 response withoutoutput.message(AWS Coral routing errors when the cluster points atbedrock.<region>instead ofbedrock-runtime.<region>, guardrail interventions, empty bodies) was dereferencingOutput.Message.Roleand panicking the ext-proc — now it returns a descriptive error including the BedrockstopReasoninstead of taking the pod into a panic loop.Developer experience
http2.initialStreamWindowSizeandinitialConnectionWindowSizeonClientTrafficPolicy— the separate flow-control cap that keeps causing413 request_payload_too_largeeven after operators bumpbufferLimit. Fixes #2130.💡 New design proposals & feature requests
A few new threads worth a read if any of these are in your wheelhouse:
/v1beta/models/<m>:generateContentclient schema — #2165 by @jaimeluengo. Adds aGoogleGenAIclient schema so the gateway can accept Gemini-native requests (path-extracted model,x-goog-api-keyauth, SSE:streamGenerateContent) and translate to any backend — the missing counterpart to today'sAnthropicandOpenAIclient schemas. Unlocks@google/genai,gemini-cli, andlangchain-googleclients, which otherwise can't talk to the gateway at all. Author has offered to drive the PR in scoped increments.MCPRoutecontroller currently mints one placeholderBackend(all pointing at the same dummy IP, all rewritten by the extension server to the same in-pod127.0.0.1:9856) per route — 86 routes mean 86 redundant CDS clusters with their own connection-pool and stats state. Proposes a single Gateway-owned shared Backend instead; author has offered a PR.namespaceandtool_searchtool types — #2164 by @jaimeluengo. The strictResponseToolUnionunmarshaller bails on twotypevalues that OpenAI Codex emits today (namespacefor tool grouping,tool_searchfor dynamic-tool discovery) — requests are hard-rejected at 400 before the model header is even extracted, breaking retry/spillover paths. Proposes two struct additions to the lenient list; author has a PR ready to send.InferencePoolbackend ignores cross-namespace refs — #2173 by @ammarasyad.AIGatewayRoutedocs advertise cross-namespacebackendRefswithReferenceGrant, but for anInferencePoolbackend the generatedHTTPRouterewrites the namespace to theAIGatewayRoute's own — silently, withAccepted: True. Includes a precise root-cause pointer intointernal/controller/ai_gateway_route.go.error.code— #2151 by @fdaforno. vLLM (and other OpenAI-compatible backends) emitcodeas a JSON number; the translator's error struct expects a string, the unmarshal fails, and the client sees an empty HTTP 500 instead of the actual 4xx diagnostic. Two proposed paths: relax the type to accept both, or fall back to pass-through when the error body can't be parsed.👀 PRs looking for review
If you have time to help review, these are open and waiting:
InferencePoolas a backend reference onAIServiceBackendby @isztldav — lets the inference-pool backend flow through the sameAIServiceBackendsurface as other providers.QuotaPolicyby @yuzisun — still the 0.7 quota work; unit-transformation comments between the API and the rate-limit service are the blocker.🙏 Thanks to this week's contributors
@cjackal, @CodePrometheus, @dzibma, @xianml, @sriyer, @siddharth1036, @taiman724, @ajac-zero, @aishwaryaraimule21, @walsm232, @isztldav, @albe2669, @nacx, @yuzisun, @hustxiayang, @nuthalapativarun, @ChrisJBurns, @anurags25, @sc7565, @mtparet, @mturac, @arpitjain099, @immanuwell, @PatilHrushikesh, @sivanantha321, @jaimeluengo, @ammarasyad, @kanurag94, @fdaforno, @missBerg — and everyone who showed up to triage, review, and discuss.
See you next week!
Beta Was this translation helpful? Give feedback.
All reactions