docs: proposal for OAuth 2.0 Token Exchange as Upstream Auth for MCP Backends by nacx · Pull Request #2052 · envoyproxy/ai-gateway

nacx · 2026-04-15T11:08:11Z

Description

This PR adds a proposal to implement OAuth 2.0 Token Exchange as Upstream Auth for MCP Backends. The proposal focuses on the APIs to enable token exchange for selected backends.

Related Issues/PRs (if applicable)

Proposal for #2036

Special notes for reviewers (if applicable)

N/A

…Backends Signed-off-by: Ignasi Barrera <nacx@apache.org>

Signed-off-by: Ignasi Barrera <nacx@apache.org>

codecov-commenter · 2026-04-15T18:11:49Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.49%. Comparing base (5d1305d) to head (0aba6dc).
⚠️ Report is 21 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2052      +/-   ##
==========================================
+ Coverage   84.38%   84.49%   +0.11%     
==========================================
  Files         130      133       +3     
  Lines       18112    18468     +356     
==========================================
+ Hits        15283    15605     +322     
- Misses       1883     1904      +21     
- Partials      946      959      +13

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Flgado · 2026-04-15T21:23:38Z

+//
+// +kubebuilder:validation:XValidation:rule="has(self.clientID)",message="clientID is required"
+// +kubebuilder:validation:XValidation:rule="has(self.clientSecretRef)",message="clientSecretRef is required"
+type MCPTokenExchangeClientAuth struct {


Should we also support:

private_key_jwt

MTLS (maybe is tricky here)

I’m wondering whether all STS services support these authentication methods.

Yes, we could, although I think having an API with a type that can incrementally accommodate new methods is a good start. I'd start by providing one, and the API should be backwards-compatible when we want to add more.

Designing the ClientAuth in a generic way is a great idea.
I think the use case to authenticate via k8s service accounts jwt tokens could be quite common
see https://datatracker.ietf.org/doc/html/rfc7523#section-2.2
Then client_assertion_type & client_assertion is necessary, which optionally directly integrate with K8s ServiceToken Secrets/Mounts?

That sounds good as an addition once we have the basic implementation, but I'd rather keep the API open and backward-compatible and focus on one first use case. WDYT?

@nacx makes sense, thank you.

I thinks its also worth to forsee an token type agnostic exchange, so that you can theoretically get back an OAuth Transaction Token

Sounds good. IIUC, the current proposal should also cover that, as it allows for configuring the subject token type and requested token type. Is there something you see that's missing in this regard?

Flgado · 2026-04-15T21:28:37Z

+- This proposal does **not** implement a full STS within AIGW itself - it delegates token issuance to an external OAuth 2.0 Authorization Server
+- This proposal does **not** address non-MCP backends (e.g., LLM backends use `BackendSecurityPolicy`; a separate extension could adapt this pattern there)
+- Rich Authorization Requests (RFC 9396) are **not** in scope for this proposal and may be addressed in a future iteration
+- Refresh token management is explicitly out of scope for the initial proposal (see [Caveats](#9-caveats-and-tradeoffs))


Hmm, what will happen when tokens expired? Will need to reconnect for now ?
Should we not receive from the STS server the expires_in and try too get a new token before this ttl ?

Yes, we should probably use that returned value to configure the TTL in the cache, to min(expires_in, user_config)

The upstream AccessToken is cached with keyed by user token+ backend + scope

The upstream token is cached (keyed by user token + backend + scope) and set on the outgoing request.

If we replace the upstream AccessToken in the cache after refreshing it, then it is probably useless and that cache is never hit, because user tokens are AccessTokens with low validity windows.

Is the cache in memory? Which means all of its content will be lost when we deploy / upgrade the gateway? I think it's fine, just want to call it out.

On an initial implementation, yes, although AIGW already has a persistent store (Redis) that is used for rate limiting. If we wanted to implement a persistent one, that could be done, although I don't think that persistence would be necessary for the exchanged tokens.

Hritik003 · 2026-04-23T18:16:01Z

+
+// MCPTokenExchangeSemantics controls whether token exchange uses delegation
+// or impersonation semantics as defined in RFC-8693 §1.1.
+// +kubebuilder:validation:Enum=Delegation;Impersonation
+type MCPTokenExchangeSemantics string
+
+const (


can you throw some light on which upstream providers support these?

What do you mean? Do you want a list of all existing IdPs that support OAuth token exchange?

Hritik003 · 2026-04-23T18:27:53Z

+
+The `MCPRoute.spec.securityPolicy.oauth` section (which configures the gateway-facing JWT validation) MUST be configured when `tokenExchange` is used as upstream auth. A CRD validation rule should enforce this.


this hard constraint is because token exchange requires a trusted “subject token” (user identity), but API key auth does not provide that?

Right, you need the incoming token to exchange for the upstream one.

Hritik003 · 2026-04-23T18:43:24Z

+	// PrivateKeyRef references a Kubernetes Secret containing the private key
+	// used to sign the JWT. The secret must have a key "privateKey" containing
+	// a PEM-encoded RSA or EC private key.
+	//
+	// +kubebuilder:validation:Required
+	PrivateKeyRef gwapiv1.SecretObjectReference `json:"privateKeyRef"`


Should this key be rotated because there can be a case where this key is compromised, and thus an attacker can impersonate the Gateway and request tokens of any user from the sts

That's a good point. The implementation could watch for changes to this PK and update the configuration if it changes.

edan-binshtok · 2026-04-27T19:45:58Z

hi @nacx and rest of the team

The proposal models a single token exchange. A lot of real-world federation against major clouds (GCP, AWS, Azure) needs two hops — would love to see chained exchange as a first-class concept before the API shape locks in. Also flagging two smaller papercuts at the bottom.

Thanks for driving this — the per-backend tokenExchange and the IdP/STS separation are exactly right.

The problem: single-hop is rarely enough

The first STS call validates the user's IdP-issued JWT and returns a federated token, but that federated token is usually not the credential the upstream backend actually wants. A second exchange/impersonation step is typically needed. Three concrete examples:

Keycloak → GCP

sts.googleapis.com/v1/token with the Keycloak JWT → federated access token (principal is principal://iam.../subject/<user>).
iamcredentials.googleapis.com/.../serviceAccounts/{sa}:generateAccessToken → normal SA access token.

Why hop 2: GCP IAM policies, audit pipelines, VPC-SC perimeters, and older client libraries are built around serviceAccount: principals. Federated principals force an N×M IAM-binding explosion (every user × every resource) instead of the standard "bind permissions to an SA, grant users tokenCreator on the SA" pattern.

OIDC IdP → AWS (cross-account)

sts:AssumeRoleWithWebIdentity with the OIDC JWT → temp creds for an entry role in account A.
sts:AssumeRole from account A into the target role in account B.

Why hop 2: AWS OIDC federation lands you in a single trust-account role. Cross-account access (the common org-wide-IdP + per-team-account topology) requires a second AssumeRole.

Entra ID On-Behalf-Of

Validate user's Entra-issued JWT.
OBO call to /oauth2/v2.0/token with requested_token_use=on_behalf_of to mint a token for the downstream API's audience.

Why hop 2: the user's token is audience-bound to the gateway's app registration; the downstream API rejects it.

(Same shape applies to Vault as a token broker: JWT auth → dynamic-credentials engine.)

In all cases, the proposal as written can express only the first hop.

Suggested API shape

A minimal change that unblocks all of these without breaking the existing single-hop config:

type MCPBackendTokenExchange struct {
      // ... all existing fields unchanged ...

      // Then optionally chains a follow-up exchange. The token returned by *this*
      // step is used as the subject_token for the Then step. The token injected
      // into the upstream request is the result of the final step in the chain.
      //
      // Use cases: GCP federated token → SA impersonation, AWS AssumeRoleWithWebIdentity
      // → cross-account AssumeRole, Entra OBO after initial validation.
      //
      // Chain depth SHOULD be capped (suggest: 3) to prevent runaway recursion.
      // +optional
      Then *MCPBackendTokenExchange `json:"then,omitempty"`

      // AdditionalParams adds vendor-specific parameters to the token exchange
      // request body (RFC-8693 §2.1 permits extension parameters). Standard
      // parameters (grant_type, subject_token, audience, scope, ...) MUST NOT
      // be overridden via this map.
      //
      // Examples: GCP STS uses a WIF provider URI as `audience`; AWS STS uses
      // `RoleArn`/`RoleSessionName`; Entra OBO uses `requested_token_use=on_behalf_of`.
      // +optional
      AdditionalParams map[string]string `json:"additionalParams,omitempty"`
}

Alternative if recursion is unappealing: replace Then with Steps []MCPBackendTokenExchangeStep on MCPBackendSecurityPolicy. Recursive is less invasive; a slice is easier to validate length on.

Worked example — Keycloak → GCP, end to end

backendRefs:
  - name: gcp-mcp
    securityPolicy:
      tokenExchange:
        # Step 1: Keycloak JWT → GCP federated token
        stsEndpoint: "https://sts.googleapis.com/v1/token"
        subjectTokenType: "urn:ietf:params:oauth:token-type:jwt"
        audience: "//iam.googleapis.com/projects/123/locations/global/workloadIdentityPools/keycloak/providers/keycloak"
        requestedTokenType: "urn:ietf:params:oauth:token-type:access_token"
        scopes: ["https://www.googleapis.com/auth/cloud-platform"]
        semantics: "Impersonation"   # GCP STS rejects actor_token

        then:
          # Step 2: federated token → impersonated SA token
          stsEndpoint: "https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/user-mcp-sa@proj.iam.gserviceaccount.com:generateAccessToken"
          subjectTokenType: "urn:ietf:params:oauth:token-type:access_token"
          requestedTokenType: "urn:ietf:params:oauth:token-type:access_token"
          scopes: ["https://www.googleapis.com/auth/cloud-platform"]
          semantics: "Impersonation"
          additionalParams:
            lifetime: "3600s"

AWS cross-account version:

tokenExchange:
  stsEndpoint: "https://sts.amazonaws.com/?Action=AssumeRoleWithWebIdentity&Version=2011-06-15"
  subjectTokenType: "urn:ietf:params:oauth:token-type:jwt"
  semantics: "Impersonation"
  additionalParams:
    RoleArn: "arn:aws:iam::111111111111:role/keycloak-entry"
    RoleSessionName: "aigw-${user.sub}"
  then:
    stsEndpoint: "https://sts.amazonaws.com/?Action=AssumeRole&Version=2011-06-15"
    subjectTokenType: "urn:ietf:params:oauth:token-type:access_token"
    semantics: "Impersonation"
    additionalParams:
      RoleArn: "arn:aws:iam::222222222222:role/mcp-target"
      RoleSessionName: "aigw-${user.sub}"

The cache-key definition in §6.9 would need to become per-step (each step caches its own output keyed by its inputs) — worth one sentence in the doc.

Two smaller papercuts (independent of the above)

Drop the "actorToken required when Delegation" validation rule. GCP STS, AWS STS, and Entra OBO all reject actor_token. Forcing every Delegation config to fabricate one means those STSes can't use Delegation semantics at all — operators are pushed to Impersonation, losing the act claim and audit trail. Suggest letting actorToken be genuinely optional under Delegation (just no act claim if absent).
Soften the clientAuth "NOT RECOMMENDED" copy. Current doc treats omission as production-unsafe, but for GCP STS / AWS STS omission is the only correct configuration (those endpoints don't accept OAuth client auth — they authenticate via the subject token itself). Worth distinguishing the two cases so users targeting cloud STSes don't get told they're misconfigured.

Happy to put up a follow-up PR with the schema changes once the proposal lands, if this direction sounds reasonable. Thanks again!

nacx · 2026-04-29T14:06:56Z

The problem: single-hop is rarely enough

I'd like to focus first on having an implementation that supports the single hop, then add the possibility to chain them to enable the proposed use cases. It would be good to add an additional section to the proposal with the chained approach, or an additional document in the same directory, referring to a "v2" of this (that could be done by adding to this PR or in another independent one WDYT?).

Two smaller papercuts (independent of the above)

I've opened a base implementation for this spec here: #2092
There I've removed the "semantics" field, as it is redundant and can be derived from the presence of the actorToken bits, and made the actorToken fully optional.

Signed-off-by: Ignasi Barrera <nacx@apache.org>

missBerg

Approving this proposal. This is a proposal; refinements in approach may emerge during implementation.

**Description** New Features Multi-Tenant Hostname Routing - AIGatewayRoute gains a hostnames field enabling hostname-based model scoping — serve different model catalogs per tenant from a single Gateway - /v1/models automatically scopes its response to models matching the request's Host header - Wildcard hostnames (*.ai.example.com) supported via Gateway API hostname matching rules Provider Translation - Anthropic → AWS Bedrock Converse — new translator path lets Anthropic-native clients reach Bedrock without switching protocols (text, images, tool use, thinking, streaming) - Anthropic → OpenAI reasoning & image support — thinking/reasoning blocks and image content no longer silently dropped during translation - Claude Opus 4.7 reasoning — display parameter (summarized/omitted), xhigh effort tier, and claude-mythos-preview model recognition - Anthropic prefix support — VersionedAPISchema.prefix now works for Anthropic backends (e.g., /{prefix}/messages) - anthropic-beta header forwarding — mapped into anthropic_beta body field for AWSAnthropic backends OpenAI API Compatibility - Audio transcription & translation — full data-plane support for /v1/audio/transcriptions and /v1/audio/translations (Whisper endpoints, multipart/form-data) - Azure OpenAI Responses API — /v1/responses routes to Azure's /openai/responses?api-version=... path - audio_url and video_url content types — multimodal audio/video inputs for compatible backends (vLLM, phi-4-mm, Qwen 3.5) Quota-Aware Routing - Backend rate limit filter injection for QuotaPolicy — first runtime enforcement: controller injects a backend rate limit filter when a QuotaPolicy is attached to an AIServiceBackend MCP Gateway - Authorization-filtered tools/list — omits tools the caller isn't authorized to invoke, preventing tool discovery leaks Observability - Smarter log redaction — developer-authored metadata (tool descriptions, function names, JSON schemas) visible in debug logs; user content and AI-generated text remain redacted API Changes - AIGatewayRoute.spec.hostnames — new optional field for hostname-based request filtering - AIGatewayRoute.spec.rules capped at 15 (down from 128) to match Gateway API HTTPRoute limits - VersionedAPISchema.prefix extended to Anthropic backends - QuotaPolicy now has runtime enforcement (backend rate limit filter injection) Bug Fixes - SSE parser handles data:{json} (no space after colon) - Responses API streaming: buffer incomplete SSE events across response body chunks - Responses API: capture token usage from response.incomplete and response.failed events - Nil-pointer guard in AWS Bedrock response translator (HTTP 200 with no output field) - Comprehensive Gemini finish-reason mapping (previously everything fell through to content_filter) - GCP Vertex AI streaming: emit empty delta object instead of omitting it - Responses API: handle typeless assistant output messages (e.g., from OpenCode) Docs - Proposal: MCPBackend CRD (#2144) - Proposal: OAuth 2.0 Token Exchange as Upstream Auth for MCP Backends (#2052) CI - contents:write permission declared on Release workflow's release job (#2139) --------- Signed-off-by: achoo30 <achoo30@bloomberg.net> Signed-off-by: Erica Hughberg <erica.sundberg.90@gmail.com> Co-authored-by: Erica Hughberg <erica.sundberg.90@gmail.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

**Description** New Features Multi-Tenant Hostname Routing - AIGatewayRoute gains a hostnames field enabling hostname-based model scoping — serve different model catalogs per tenant from a single Gateway - /v1/models automatically scopes its response to models matching the request's Host header - Wildcard hostnames (*.ai.example.com) supported via Gateway API hostname matching rules Provider Translation - Anthropic → AWS Bedrock Converse — new translator path lets Anthropic-native clients reach Bedrock without switching protocols (text, images, tool use, thinking, streaming) - Anthropic → OpenAI reasoning & image support — thinking/reasoning blocks and image content no longer silently dropped during translation - Claude Opus 4.7 reasoning — display parameter (summarized/omitted), xhigh effort tier, and claude-mythos-preview model recognition - Anthropic prefix support — VersionedAPISchema.prefix now works for Anthropic backends (e.g., /{prefix}/messages) - anthropic-beta header forwarding — mapped into anthropic_beta body field for AWSAnthropic backends OpenAI API Compatibility - Audio transcription & translation — full data-plane support for /v1/audio/transcriptions and /v1/audio/translations (Whisper endpoints, multipart/form-data) - Azure OpenAI Responses API — /v1/responses routes to Azure's /openai/responses?api-version=... path - audio_url and video_url content types — multimodal audio/video inputs for compatible backends (vLLM, phi-4-mm, Qwen 3.5) Quota-Aware Routing - Backend rate limit filter injection for QuotaPolicy — first runtime enforcement: controller injects a backend rate limit filter when a QuotaPolicy is attached to an AIServiceBackend MCP Gateway - Authorization-filtered tools/list — omits tools the caller isn't authorized to invoke, preventing tool discovery leaks Observability - Smarter log redaction — developer-authored metadata (tool descriptions, function names, JSON schemas) visible in debug logs; user content and AI-generated text remain redacted API Changes - AIGatewayRoute.spec.hostnames — new optional field for hostname-based request filtering - AIGatewayRoute.spec.rules capped at 15 (down from 128) to match Gateway API HTTPRoute limits - VersionedAPISchema.prefix extended to Anthropic backends - QuotaPolicy now has runtime enforcement (backend rate limit filter injection) Bug Fixes - SSE parser handles data:{json} (no space after colon) - Responses API streaming: buffer incomplete SSE events across response body chunks - Responses API: capture token usage from response.incomplete and response.failed events - Nil-pointer guard in AWS Bedrock response translator (HTTP 200 with no output field) - Comprehensive Gemini finish-reason mapping (previously everything fell through to content_filter) - GCP Vertex AI streaming: emit empty delta object instead of omitting it - Responses API: handle typeless assistant output messages (e.g., from OpenCode) Docs - Proposal: MCPBackend CRD (envoyproxy#2144) - Proposal: OAuth 2.0 Token Exchange as Upstream Auth for MCP Backends (envoyproxy#2052) CI - contents:write permission declared on Release workflow's release job (envoyproxy#2139) --------- Signed-off-by: achoo30 <achoo30@bloomberg.net> Signed-off-by: Erica Hughberg <erica.sundberg.90@gmail.com> Co-authored-by: Erica Hughberg <erica.sundberg.90@gmail.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: saixso <sai.soundararajan@spoton.com>

docs: proposal for OAuth 2.0 Token Exchange as Upstream Auth for MCP …

456967e

…Backends Signed-off-by: Ignasi Barrera <nacx@apache.org>

nacx requested a review from a team as a code owner April 15, 2026 11:08

dosubot Bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Apr 15, 2026

format markdown

9ce2a57

Signed-off-by: Ignasi Barrera <nacx@apache.org>

Flgado reviewed Apr 15, 2026

View reviewed changes

Hritik003 reviewed Apr 23, 2026

View reviewed changes

nacx mentioned this pull request Apr 29, 2026

mcp: initial implementation of OAuth Token Exchange for upstream auth #2092

Open

nacx and others added 4 commits April 29, 2026 16:12

address comments

2a99c11

Signed-off-by: Ignasi Barrera <nacx@apache.org>

format

0aba6dc

Signed-off-by: Ignasi Barrera <nacx@apache.org>

Merge branch 'main' into mcp-token-exchange

9165c05

Merge branch 'main' into mcp-token-exchange

d514ed0

missBerg approved these changes Jun 1, 2026

View reviewed changes

Merge branch 'main' into mcp-token-exchange

132ba3e

nacx enabled auto-merge (squash) June 1, 2026 18:32

nacx merged commit fd2038c into envoyproxy:main Jun 1, 2026
25 checks passed

aabchoo mentioned this pull request Jun 4, 2026

release: v0.7.0 release notes #2177

Merged


		The `MCPRoute.spec.securityPolicy.oauth` section (which configures the gateway-facing JWT validation) MUST be configured when `tokenExchange` is used as upstream auth. A CRD validation rule should enforce this.

Uh oh!

Conversation

nacx commented Apr 15, 2026

Uh oh!

codecov-commenter commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

edan-binshtok commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The problem: single-hop is rarely enough

Suggested API shape

Worked example — Keycloak → GCP, end to end

Two smaller papercuts (independent of the above)

Uh oh!

nacx commented Apr 29, 2026

Uh oh!

missBerg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

codecov-commenter commented Apr 15, 2026 •

edited

Loading

edan-binshtok commented Apr 27, 2026 •

edited

Loading