Skip to content

docs: proposal for OAuth 2.0 Token Exchange as Upstream Auth for MCP Backends#2052

Merged
nacx merged 7 commits into
envoyproxy:mainfrom
nacx:mcp-token-exchange
Jun 1, 2026
Merged

docs: proposal for OAuth 2.0 Token Exchange as Upstream Auth for MCP Backends#2052
nacx merged 7 commits into
envoyproxy:mainfrom
nacx:mcp-token-exchange

Conversation

@nacx

@nacx nacx commented Apr 15, 2026

Copy link
Copy Markdown
Member

Description

This PR adds a proposal to implement OAuth 2.0 Token Exchange as Upstream Auth for MCP Backends. The proposal focuses on the APIs to enable token exchange for selected backends.

Related Issues/PRs (if applicable)

Proposal for #2036

Special notes for reviewers (if applicable)

N/A

…Backends

Signed-off-by: Ignasi Barrera <nacx@apache.org>
@nacx nacx requested a review from a team as a code owner April 15, 2026 11:08
@dosubot dosubot Bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Apr 15, 2026
Signed-off-by: Ignasi Barrera <nacx@apache.org>
@codecov-commenter

codecov-commenter commented Apr 15, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.49%. Comparing base (5d1305d) to head (0aba6dc).
⚠️ Report is 21 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2052      +/-   ##
==========================================
+ Coverage   84.38%   84.49%   +0.11%     
==========================================
  Files         130      133       +3     
  Lines       18112    18468     +356     
==========================================
+ Hits        15283    15605     +322     
- Misses       1883     1904      +21     
- Partials      946      959      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

//
// +kubebuilder:validation:XValidation:rule="has(self.clientID)",message="clientID is required"
// +kubebuilder:validation:XValidation:rule="has(self.clientSecretRef)",message="clientSecretRef is required"
type MCPTokenExchangeClientAuth struct {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also support:

  • private_key_jwt
  • MTLS (maybe is tricky here)

I’m wondering whether all STS services support these authentication methods.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could, although I think having an API with a type that can incrementally accommodate new methods is a good start. I'd start by providing one, and the API should be backwards-compatible when we want to add more.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Designing the ClientAuth in a generic way is a great idea.
I think the use case to authenticate via k8s service accounts jwt tokens could be quite common
see https://datatracker.ietf.org/doc/html/rfc7523#section-2.2
Then client_assertion_type & client_assertion is necessary, which optionally directly integrate with K8s ServiceToken Secrets/Mounts?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good as an addition once we have the basic implementation, but I'd rather keep the API open and backward-compatible and focus on one first use case. WDYT?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nacx makes sense, thank you.

I thinks its also worth to forsee an token type agnostic exchange, so that you can theoretically get back an OAuth Transaction Token

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. IIUC, the current proposal should also cover that, as it allows for configuring the subject token type and requested token type. Is there something you see that's missing in this regard?

- This proposal does **not** implement a full STS within AIGW itself - it delegates token issuance to an external OAuth 2.0 Authorization Server
- This proposal does **not** address non-MCP backends (e.g., LLM backends use `BackendSecurityPolicy`; a separate extension could adapt this pattern there)
- Rich Authorization Requests (RFC 9396) are **not** in scope for this proposal and may be addressed in a future iteration
- Refresh token management is explicitly out of scope for the initial proposal (see [Caveats](#9-caveats-and-tradeoffs))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, what will happen when tokens expired? Will need to reconnect for now ?
Should we not receive from the STS server the expires_in and try too get a new token before this ttl ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should probably use that returned value to configure the TTL in the cache, to min(expires_in, user_config)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upstream AccessToken is cached with keyed by user token+ backend + scope

The upstream token is cached (keyed by user token + backend + scope) and set on the outgoing request.

If we replace the upstream AccessToken in the cache after refreshing it, then it is probably useless and that cache is never hit, because user tokens are AccessTokens with low validity windows.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the cache in memory? Which means all of its content will be lost when we deploy / upgrade the gateway? I think it's fine, just want to call it out.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On an initial implementation, yes, although AIGW already has a persistent store (Redis) that is used for rate limiting. If we wanted to implement a persistent one, that could be done, although I don't think that persistence would be necessary for the exchanged tokens.

Comment on lines +260 to +266

// MCPTokenExchangeSemantics controls whether token exchange uses delegation
// or impersonation semantics as defined in RFC-8693 §1.1.
// +kubebuilder:validation:Enum=Delegation;Impersonation
type MCPTokenExchangeSemantics string

const (

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you throw some light on which upstream providers support these?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? Do you want a list of all existing IdPs that support OAuth token exchange?

Comment on lines +483 to +484

The `MCPRoute.spec.securityPolicy.oauth` section (which configures the gateway-facing JWT validation) MUST be configured when `tokenExchange` is used as upstream auth. A CRD validation rule should enforce this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this hard constraint is because token exchange requires a trusted “subject token” (user identity), but API key auth does not provide that?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, you need the incoming token to exchange for the upstream one.

Comment on lines +319 to +324
// PrivateKeyRef references a Kubernetes Secret containing the private key
// used to sign the JWT. The secret must have a key "privateKey" containing
// a PEM-encoded RSA or EC private key.
//
// +kubebuilder:validation:Required
PrivateKeyRef gwapiv1.SecretObjectReference `json:"privateKeyRef"`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this key be rotated because there can be a case where this key is compromised, and thus an attacker can impersonate the Gateway and request tokens of any user from the sts

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. The implementation could watch for changes to this PK and update the configuration if it changes.

@edan-binshtok

edan-binshtok commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

hi @nacx and rest of the team

The proposal models a single token exchange. A lot of real-world federation against major clouds (GCP, AWS, Azure) needs two hops — would love to see chained exchange as a first-class concept before the API shape locks in. Also flagging two smaller papercuts at the bottom.

Thanks for driving this — the per-backend tokenExchange and the IdP/STS separation are exactly right.

The problem: single-hop is rarely enough

The first STS call validates the user's IdP-issued JWT and returns a federated token, but that federated token is usually not the credential the upstream backend actually wants. A second exchange/impersonation step is typically needed. Three concrete examples:

Keycloak → GCP

  1. sts.googleapis.com/v1/token with the Keycloak JWT → federated access token (principal is principal://iam.../subject/<user>).
  2. iamcredentials.googleapis.com/.../serviceAccounts/{sa}:generateAccessToken → normal SA access token.

Why hop 2: GCP IAM policies, audit pipelines, VPC-SC perimeters, and older client libraries are built around serviceAccount: principals. Federated principals force an N×M IAM-binding explosion (every user × every resource) instead of the standard "bind permissions to an SA, grant users tokenCreator on the SA" pattern.

OIDC IdP → AWS (cross-account)

  1. sts:AssumeRoleWithWebIdentity with the OIDC JWT → temp creds for an entry role in account A.
  2. sts:AssumeRole from account A into the target role in account B.

Why hop 2: AWS OIDC federation lands you in a single trust-account role. Cross-account access (the common org-wide-IdP + per-team-account topology) requires a second AssumeRole.

Entra ID On-Behalf-Of

  1. Validate user's Entra-issued JWT.
  2. OBO call to /oauth2/v2.0/token with requested_token_use=on_behalf_of to mint a token for the downstream API's audience.

Why hop 2: the user's token is audience-bound to the gateway's app registration; the downstream API rejects it.

(Same shape applies to Vault as a token broker: JWT auth → dynamic-credentials engine.)

In all cases, the proposal as written can express only the first hop.

Suggested API shape

A minimal change that unblocks all of these without breaking the existing single-hop config:

type MCPBackendTokenExchange struct {
      // ... all existing fields unchanged ...

      // Then optionally chains a follow-up exchange. The token returned by *this*
      // step is used as the subject_token for the Then step. The token injected
      // into the upstream request is the result of the final step in the chain.
      //
      // Use cases: GCP federated token → SA impersonation, AWS AssumeRoleWithWebIdentity
      // → cross-account AssumeRole, Entra OBO after initial validation.
      //
      // Chain depth SHOULD be capped (suggest: 3) to prevent runaway recursion.
      // +optional
      Then *MCPBackendTokenExchange `json:"then,omitempty"`

      // AdditionalParams adds vendor-specific parameters to the token exchange
      // request body (RFC-8693 §2.1 permits extension parameters). Standard
      // parameters (grant_type, subject_token, audience, scope, ...) MUST NOT
      // be overridden via this map.
      //
      // Examples: GCP STS uses a WIF provider URI as `audience`; AWS STS uses
      // `RoleArn`/`RoleSessionName`; Entra OBO uses `requested_token_use=on_behalf_of`.
      // +optional
      AdditionalParams map[string]string `json:"additionalParams,omitempty"`
}

Alternative if recursion is unappealing: replace Then with Steps []MCPBackendTokenExchangeStep on MCPBackendSecurityPolicy. Recursive is less invasive; a slice is easier to validate length on.

Worked example — Keycloak → GCP, end to end

backendRefs:
  - name: gcp-mcp
    securityPolicy:
      tokenExchange:
        # Step 1: Keycloak JWT → GCP federated token
        stsEndpoint: "https://sts.googleapis.com/v1/token"
        subjectTokenType: "urn:ietf:params:oauth:token-type:jwt"
        audience: "//iam.googleapis.com/projects/123/locations/global/workloadIdentityPools/keycloak/providers/keycloak"
        requestedTokenType: "urn:ietf:params:oauth:token-type:access_token"
        scopes: ["https://www.googleapis.com/auth/cloud-platform"]
        semantics: "Impersonation"   # GCP STS rejects actor_token

        then:
          # Step 2: federated token → impersonated SA token
          stsEndpoint: "https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/user-mcp-sa@proj.iam.gserviceaccount.com:generateAccessToken"
          subjectTokenType: "urn:ietf:params:oauth:token-type:access_token"
          requestedTokenType: "urn:ietf:params:oauth:token-type:access_token"
          scopes: ["https://www.googleapis.com/auth/cloud-platform"]
          semantics: "Impersonation"
          additionalParams:
            lifetime: "3600s"

AWS cross-account version:

tokenExchange:
  stsEndpoint: "https://sts.amazonaws.com/?Action=AssumeRoleWithWebIdentity&Version=2011-06-15"
  subjectTokenType: "urn:ietf:params:oauth:token-type:jwt"
  semantics: "Impersonation"
  additionalParams:
    RoleArn: "arn:aws:iam::111111111111:role/keycloak-entry"
    RoleSessionName: "aigw-${user.sub}"
  then:
    stsEndpoint: "https://sts.amazonaws.com/?Action=AssumeRole&Version=2011-06-15"
    subjectTokenType: "urn:ietf:params:oauth:token-type:access_token"
    semantics: "Impersonation"
    additionalParams:
      RoleArn: "arn:aws:iam::222222222222:role/mcp-target"
      RoleSessionName: "aigw-${user.sub}"

The cache-key definition in §6.9 would need to become per-step (each step caches its own output keyed by its inputs) — worth one sentence in the doc.

Two smaller papercuts (independent of the above)

  1. Drop the "actorToken required when Delegation" validation rule. GCP STS, AWS STS, and Entra OBO all reject actor_token. Forcing every Delegation config to fabricate one means those STSes can't use Delegation semantics at all — operators are pushed to Impersonation, losing the act claim and audit trail. Suggest letting actorToken be genuinely optional under Delegation (just no act claim if absent).

  2. Soften the clientAuth "NOT RECOMMENDED" copy. Current doc treats omission as production-unsafe, but for GCP STS / AWS STS omission is the only correct configuration (those endpoints don't accept OAuth client auth — they authenticate via the subject token itself). Worth distinguishing the two cases so users targeting cloud STSes don't get told they're misconfigured.


Happy to put up a follow-up PR with the schema changes once the proposal lands, if this direction sounds reasonable. Thanks again!

@nacx

nacx commented Apr 29, 2026

Copy link
Copy Markdown
Member Author

The problem: single-hop is rarely enough

I'd like to focus first on having an implementation that supports the single hop, then add the possibility to chain them to enable the proposed use cases. It would be good to add an additional section to the proposal with the chained approach, or an additional document in the same directory, referring to a "v2" of this (that could be done by adding to this PR or in another independent one WDYT?).

Two smaller papercuts (independent of the above)

I've opened a base implementation for this spec here: #2092
There I've removed the "semantics" field, as it is redundant and can be derived from the presence of the actorToken bits, and made the actorToken fully optional.

nacx and others added 4 commits April 29, 2026 16:12
Signed-off-by: Ignasi Barrera <nacx@apache.org>
Signed-off-by: Ignasi Barrera <nacx@apache.org>

@missBerg missBerg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this proposal. This is a proposal; refinements in approach may emerge during implementation.

@nacx nacx enabled auto-merge (squash) June 1, 2026 18:32
@nacx nacx merged commit fd2038c into envoyproxy:main Jun 1, 2026
25 checks passed
aabchoo added a commit that referenced this pull request Jun 6, 2026
**Description**
New Features
                  
Multi-Tenant Hostname Routing
- AIGatewayRoute gains a hostnames field enabling hostname-based model
scoping — serve different model catalogs per tenant from a single
Gateway
- /v1/models automatically scopes its response to models matching the
request's Host header
- Wildcard hostnames (*.ai.example.com) supported via Gateway API
hostname matching rules
Provider Translation
- Anthropic → AWS Bedrock Converse — new translator path lets
Anthropic-native clients reach Bedrock without switching protocols
(text, images, tool use, thinking, streaming)
- Anthropic → OpenAI reasoning & image support — thinking/reasoning
blocks and image content no longer silently dropped during translation
- Claude Opus 4.7 reasoning — display parameter (summarized/omitted),
xhigh effort tier, and claude-mythos-preview model recognition
- Anthropic prefix support — VersionedAPISchema.prefix now works for
Anthropic backends (e.g., /{prefix}/messages)
- anthropic-beta header forwarding — mapped into anthropic_beta body
field for AWSAnthropic backends
OpenAI API Compatibility
- Audio transcription & translation — full data-plane support for
/v1/audio/transcriptions and /v1/audio/translations (Whisper endpoints,
multipart/form-data)
- Azure OpenAI Responses API — /v1/responses routes to Azure's
/openai/responses?api-version=... path
- audio_url and video_url content types — multimodal audio/video inputs
for compatible backends (vLLM, phi-4-mm, Qwen 3.5)
Quota-Aware Routing
- Backend rate limit filter injection for QuotaPolicy — first runtime
enforcement: controller injects a backend rate limit filter when a
QuotaPolicy is attached to an AIServiceBackend
MCP Gateway
- Authorization-filtered tools/list — omits tools the caller isn't
authorized to invoke, preventing tool discovery leaks
Observability
- Smarter log redaction — developer-authored metadata (tool
descriptions, function names, JSON schemas) visible in debug logs; user
content and AI-generated text remain redacted
API Changes
- AIGatewayRoute.spec.hostnames — new optional field for hostname-based
request filtering
- AIGatewayRoute.spec.rules capped at 15 (down from 128) to match
Gateway API HTTPRoute limits
- VersionedAPISchema.prefix extended to Anthropic backends
- QuotaPolicy now has runtime enforcement (backend rate limit filter
injection)
Bug Fixes
- SSE parser handles data:{json} (no space after colon)
- Responses API streaming: buffer incomplete SSE events across response
body chunks
- Responses API: capture token usage from response.incomplete and
response.failed events
- Nil-pointer guard in AWS Bedrock response translator (HTTP 200 with no
output field)
- Comprehensive Gemini finish-reason mapping (previously everything fell
through to content_filter)
- GCP Vertex AI streaming: emit empty delta object instead of omitting
it
- Responses API: handle typeless assistant output messages (e.g., from
OpenCode)
Docs
- Proposal: MCPBackend CRD (#2144)
- Proposal: OAuth 2.0 Token Exchange as Upstream Auth for MCP Backends
(#2052)
CI
- contents:write permission declared on Release workflow's release job
(#2139)

---------

Signed-off-by: achoo30 <achoo30@bloomberg.net>
Signed-off-by: Erica Hughberg <erica.sundberg.90@gmail.com>
Co-authored-by: Erica Hughberg <erica.sundberg.90@gmail.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
saixso pushed a commit to saixso/ai-gateway that referenced this pull request Jun 21, 2026
**Description**
New Features

Multi-Tenant Hostname Routing
- AIGatewayRoute gains a hostnames field enabling hostname-based model
scoping — serve different model catalogs per tenant from a single
Gateway
- /v1/models automatically scopes its response to models matching the
request's Host header
- Wildcard hostnames (*.ai.example.com) supported via Gateway API
hostname matching rules
Provider Translation
- Anthropic → AWS Bedrock Converse — new translator path lets
Anthropic-native clients reach Bedrock without switching protocols
(text, images, tool use, thinking, streaming)
- Anthropic → OpenAI reasoning & image support — thinking/reasoning
blocks and image content no longer silently dropped during translation
- Claude Opus 4.7 reasoning — display parameter (summarized/omitted),
xhigh effort tier, and claude-mythos-preview model recognition
- Anthropic prefix support — VersionedAPISchema.prefix now works for
Anthropic backends (e.g., /{prefix}/messages)
- anthropic-beta header forwarding — mapped into anthropic_beta body
field for AWSAnthropic backends
OpenAI API Compatibility
- Audio transcription & translation — full data-plane support for
/v1/audio/transcriptions and /v1/audio/translations (Whisper endpoints,
multipart/form-data)
- Azure OpenAI Responses API — /v1/responses routes to Azure's
/openai/responses?api-version=... path
- audio_url and video_url content types — multimodal audio/video inputs
for compatible backends (vLLM, phi-4-mm, Qwen 3.5)
Quota-Aware Routing
- Backend rate limit filter injection for QuotaPolicy — first runtime
enforcement: controller injects a backend rate limit filter when a
QuotaPolicy is attached to an AIServiceBackend
MCP Gateway
- Authorization-filtered tools/list — omits tools the caller isn't
authorized to invoke, preventing tool discovery leaks
Observability
- Smarter log redaction — developer-authored metadata (tool
descriptions, function names, JSON schemas) visible in debug logs; user
content and AI-generated text remain redacted
API Changes
- AIGatewayRoute.spec.hostnames — new optional field for hostname-based
request filtering
- AIGatewayRoute.spec.rules capped at 15 (down from 128) to match
Gateway API HTTPRoute limits
- VersionedAPISchema.prefix extended to Anthropic backends
- QuotaPolicy now has runtime enforcement (backend rate limit filter
injection)
Bug Fixes
- SSE parser handles data:{json} (no space after colon)
- Responses API streaming: buffer incomplete SSE events across response
body chunks
- Responses API: capture token usage from response.incomplete and
response.failed events
- Nil-pointer guard in AWS Bedrock response translator (HTTP 200 with no
output field)
- Comprehensive Gemini finish-reason mapping (previously everything fell
through to content_filter)
- GCP Vertex AI streaming: emit empty delta object instead of omitting
it
- Responses API: handle typeless assistant output messages (e.g., from
OpenCode)
Docs
- Proposal: MCPBackend CRD (envoyproxy#2144)
- Proposal: OAuth 2.0 Token Exchange as Upstream Auth for MCP Backends
(envoyproxy#2052)
CI
- contents:write permission declared on Release workflow's release job
(envoyproxy#2139)

---------

Signed-off-by: achoo30 <achoo30@bloomberg.net>
Signed-off-by: Erica Hughberg <erica.sundberg.90@gmail.com>
Co-authored-by: Erica Hughberg <erica.sundberg.90@gmail.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: saixso <sai.soundararajan@spoton.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants