docs: proposal for OAuth 2.0 Token Exchange as Upstream Auth for MCP Backends#2052
Conversation
…Backends Signed-off-by: Ignasi Barrera <nacx@apache.org>
Signed-off-by: Ignasi Barrera <nacx@apache.org>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2052 +/- ##
==========================================
+ Coverage 84.38% 84.49% +0.11%
==========================================
Files 130 133 +3
Lines 18112 18468 +356
==========================================
+ Hits 15283 15605 +322
- Misses 1883 1904 +21
- Partials 946 959 +13 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| // | ||
| // +kubebuilder:validation:XValidation:rule="has(self.clientID)",message="clientID is required" | ||
| // +kubebuilder:validation:XValidation:rule="has(self.clientSecretRef)",message="clientSecretRef is required" | ||
| type MCPTokenExchangeClientAuth struct { |
There was a problem hiding this comment.
Should we also support:
- private_key_jwt
- MTLS (maybe is tricky here)
I’m wondering whether all STS services support these authentication methods.
There was a problem hiding this comment.
Yes, we could, although I think having an API with a type that can incrementally accommodate new methods is a good start. I'd start by providing one, and the API should be backwards-compatible when we want to add more.
There was a problem hiding this comment.
Designing the ClientAuth in a generic way is a great idea.
I think the use case to authenticate via k8s service accounts jwt tokens could be quite common
see https://datatracker.ietf.org/doc/html/rfc7523#section-2.2
Then client_assertion_type & client_assertion is necessary, which optionally directly integrate with K8s ServiceToken Secrets/Mounts?
There was a problem hiding this comment.
That sounds good as an addition once we have the basic implementation, but I'd rather keep the API open and backward-compatible and focus on one first use case. WDYT?
There was a problem hiding this comment.
@nacx makes sense, thank you.
I thinks its also worth to forsee an token type agnostic exchange, so that you can theoretically get back an OAuth Transaction Token
There was a problem hiding this comment.
Sounds good. IIUC, the current proposal should also cover that, as it allows for configuring the subject token type and requested token type. Is there something you see that's missing in this regard?
| - This proposal does **not** implement a full STS within AIGW itself - it delegates token issuance to an external OAuth 2.0 Authorization Server | ||
| - This proposal does **not** address non-MCP backends (e.g., LLM backends use `BackendSecurityPolicy`; a separate extension could adapt this pattern there) | ||
| - Rich Authorization Requests (RFC 9396) are **not** in scope for this proposal and may be addressed in a future iteration | ||
| - Refresh token management is explicitly out of scope for the initial proposal (see [Caveats](#9-caveats-and-tradeoffs)) |
There was a problem hiding this comment.
Hmm, what will happen when tokens expired? Will need to reconnect for now ?
Should we not receive from the STS server the expires_in and try too get a new token before this ttl ?
There was a problem hiding this comment.
Yes, we should probably use that returned value to configure the TTL in the cache, to min(expires_in, user_config)
There was a problem hiding this comment.
The upstream AccessToken is cached with keyed by user token+ backend + scope
The upstream token is cached (keyed by user token + backend + scope) and set on the outgoing request.
If we replace the upstream AccessToken in the cache after refreshing it, then it is probably useless and that cache is never hit, because user tokens are AccessTokens with low validity windows.
There was a problem hiding this comment.
Is the cache in memory? Which means all of its content will be lost when we deploy / upgrade the gateway? I think it's fine, just want to call it out.
There was a problem hiding this comment.
On an initial implementation, yes, although AIGW already has a persistent store (Redis) that is used for rate limiting. If we wanted to implement a persistent one, that could be done, although I don't think that persistence would be necessary for the exchanged tokens.
|
|
||
| // MCPTokenExchangeSemantics controls whether token exchange uses delegation | ||
| // or impersonation semantics as defined in RFC-8693 §1.1. | ||
| // +kubebuilder:validation:Enum=Delegation;Impersonation | ||
| type MCPTokenExchangeSemantics string | ||
|
|
||
| const ( |
There was a problem hiding this comment.
can you throw some light on which upstream providers support these?
There was a problem hiding this comment.
What do you mean? Do you want a list of all existing IdPs that support OAuth token exchange?
|
|
||
| The `MCPRoute.spec.securityPolicy.oauth` section (which configures the gateway-facing JWT validation) MUST be configured when `tokenExchange` is used as upstream auth. A CRD validation rule should enforce this. |
There was a problem hiding this comment.
this hard constraint is because token exchange requires a trusted “subject token” (user identity), but API key auth does not provide that?
There was a problem hiding this comment.
Right, you need the incoming token to exchange for the upstream one.
| // PrivateKeyRef references a Kubernetes Secret containing the private key | ||
| // used to sign the JWT. The secret must have a key "privateKey" containing | ||
| // a PEM-encoded RSA or EC private key. | ||
| // | ||
| // +kubebuilder:validation:Required | ||
| PrivateKeyRef gwapiv1.SecretObjectReference `json:"privateKeyRef"` |
There was a problem hiding this comment.
Should this key be rotated because there can be a case where this key is compromised, and thus an attacker can impersonate the Gateway and request tokens of any user from the sts
There was a problem hiding this comment.
That's a good point. The implementation could watch for changes to this PK and update the configuration if it changes.
|
hi @nacx and rest of the team The proposal models a single token exchange. A lot of real-world federation against major clouds (GCP, AWS, Azure) needs two hops — would love to see chained exchange as a first-class concept before the API shape locks in. Also flagging two smaller papercuts at the bottom. Thanks for driving this — the per-backend The problem: single-hop is rarely enoughThe first STS call validates the user's IdP-issued JWT and returns a federated token, but that federated token is usually not the credential the upstream backend actually wants. A second exchange/impersonation step is typically needed. Three concrete examples: Keycloak → GCP
Why hop 2: GCP IAM policies, audit pipelines, VPC-SC perimeters, and older client libraries are built around OIDC IdP → AWS (cross-account)
Why hop 2: AWS OIDC federation lands you in a single trust-account role. Cross-account access (the common org-wide-IdP + per-team-account topology) requires a second Entra ID On-Behalf-Of
Why hop 2: the user's token is audience-bound to the gateway's app registration; the downstream API rejects it. (Same shape applies to Vault as a token broker: JWT auth → dynamic-credentials engine.) In all cases, the proposal as written can express only the first hop. Suggested API shapeA minimal change that unblocks all of these without breaking the existing single-hop config: type MCPBackendTokenExchange struct {
// ... all existing fields unchanged ...
// Then optionally chains a follow-up exchange. The token returned by *this*
// step is used as the subject_token for the Then step. The token injected
// into the upstream request is the result of the final step in the chain.
//
// Use cases: GCP federated token → SA impersonation, AWS AssumeRoleWithWebIdentity
// → cross-account AssumeRole, Entra OBO after initial validation.
//
// Chain depth SHOULD be capped (suggest: 3) to prevent runaway recursion.
// +optional
Then *MCPBackendTokenExchange `json:"then,omitempty"`
// AdditionalParams adds vendor-specific parameters to the token exchange
// request body (RFC-8693 §2.1 permits extension parameters). Standard
// parameters (grant_type, subject_token, audience, scope, ...) MUST NOT
// be overridden via this map.
//
// Examples: GCP STS uses a WIF provider URI as `audience`; AWS STS uses
// `RoleArn`/`RoleSessionName`; Entra OBO uses `requested_token_use=on_behalf_of`.
// +optional
AdditionalParams map[string]string `json:"additionalParams,omitempty"`
}Alternative if recursion is unappealing: replace Worked example — Keycloak → GCP, end to endbackendRefs:
- name: gcp-mcp
securityPolicy:
tokenExchange:
# Step 1: Keycloak JWT → GCP federated token
stsEndpoint: "https://sts.googleapis.com/v1/token"
subjectTokenType: "urn:ietf:params:oauth:token-type:jwt"
audience: "//iam.googleapis.com/projects/123/locations/global/workloadIdentityPools/keycloak/providers/keycloak"
requestedTokenType: "urn:ietf:params:oauth:token-type:access_token"
scopes: ["https://www.googleapis.com/auth/cloud-platform"]
semantics: "Impersonation" # GCP STS rejects actor_token
then:
# Step 2: federated token → impersonated SA token
stsEndpoint: "https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/user-mcp-sa@proj.iam.gserviceaccount.com:generateAccessToken"
subjectTokenType: "urn:ietf:params:oauth:token-type:access_token"
requestedTokenType: "urn:ietf:params:oauth:token-type:access_token"
scopes: ["https://www.googleapis.com/auth/cloud-platform"]
semantics: "Impersonation"
additionalParams:
lifetime: "3600s"AWS cross-account version: tokenExchange:
stsEndpoint: "https://sts.amazonaws.com/?Action=AssumeRoleWithWebIdentity&Version=2011-06-15"
subjectTokenType: "urn:ietf:params:oauth:token-type:jwt"
semantics: "Impersonation"
additionalParams:
RoleArn: "arn:aws:iam::111111111111:role/keycloak-entry"
RoleSessionName: "aigw-${user.sub}"
then:
stsEndpoint: "https://sts.amazonaws.com/?Action=AssumeRole&Version=2011-06-15"
subjectTokenType: "urn:ietf:params:oauth:token-type:access_token"
semantics: "Impersonation"
additionalParams:
RoleArn: "arn:aws:iam::222222222222:role/mcp-target"
RoleSessionName: "aigw-${user.sub}"The cache-key definition in §6.9 would need to become per-step (each step caches its own output keyed by its inputs) — worth one sentence in the doc. Two smaller papercuts (independent of the above)
Happy to put up a follow-up PR with the schema changes once the proposal lands, if this direction sounds reasonable. Thanks again! |
I'd like to focus first on having an implementation that supports the single hop, then add the possibility to chain them to enable the proposed use cases. It would be good to add an additional section to the proposal with the chained approach, or an additional document in the same directory, referring to a "v2" of this (that could be done by adding to this PR or in another independent one WDYT?).
I've opened a base implementation for this spec here: #2092 |
Signed-off-by: Ignasi Barrera <nacx@apache.org>
missBerg
left a comment
There was a problem hiding this comment.
Approving this proposal. This is a proposal; refinements in approach may emerge during implementation.
**Description**
New Features
Multi-Tenant Hostname Routing
- AIGatewayRoute gains a hostnames field enabling hostname-based model
scoping — serve different model catalogs per tenant from a single
Gateway
- /v1/models automatically scopes its response to models matching the
request's Host header
- Wildcard hostnames (*.ai.example.com) supported via Gateway API
hostname matching rules
Provider Translation
- Anthropic → AWS Bedrock Converse — new translator path lets
Anthropic-native clients reach Bedrock without switching protocols
(text, images, tool use, thinking, streaming)
- Anthropic → OpenAI reasoning & image support — thinking/reasoning
blocks and image content no longer silently dropped during translation
- Claude Opus 4.7 reasoning — display parameter (summarized/omitted),
xhigh effort tier, and claude-mythos-preview model recognition
- Anthropic prefix support — VersionedAPISchema.prefix now works for
Anthropic backends (e.g., /{prefix}/messages)
- anthropic-beta header forwarding — mapped into anthropic_beta body
field for AWSAnthropic backends
OpenAI API Compatibility
- Audio transcription & translation — full data-plane support for
/v1/audio/transcriptions and /v1/audio/translations (Whisper endpoints,
multipart/form-data)
- Azure OpenAI Responses API — /v1/responses routes to Azure's
/openai/responses?api-version=... path
- audio_url and video_url content types — multimodal audio/video inputs
for compatible backends (vLLM, phi-4-mm, Qwen 3.5)
Quota-Aware Routing
- Backend rate limit filter injection for QuotaPolicy — first runtime
enforcement: controller injects a backend rate limit filter when a
QuotaPolicy is attached to an AIServiceBackend
MCP Gateway
- Authorization-filtered tools/list — omits tools the caller isn't
authorized to invoke, preventing tool discovery leaks
Observability
- Smarter log redaction — developer-authored metadata (tool
descriptions, function names, JSON schemas) visible in debug logs; user
content and AI-generated text remain redacted
API Changes
- AIGatewayRoute.spec.hostnames — new optional field for hostname-based
request filtering
- AIGatewayRoute.spec.rules capped at 15 (down from 128) to match
Gateway API HTTPRoute limits
- VersionedAPISchema.prefix extended to Anthropic backends
- QuotaPolicy now has runtime enforcement (backend rate limit filter
injection)
Bug Fixes
- SSE parser handles data:{json} (no space after colon)
- Responses API streaming: buffer incomplete SSE events across response
body chunks
- Responses API: capture token usage from response.incomplete and
response.failed events
- Nil-pointer guard in AWS Bedrock response translator (HTTP 200 with no
output field)
- Comprehensive Gemini finish-reason mapping (previously everything fell
through to content_filter)
- GCP Vertex AI streaming: emit empty delta object instead of omitting
it
- Responses API: handle typeless assistant output messages (e.g., from
OpenCode)
Docs
- Proposal: MCPBackend CRD (#2144)
- Proposal: OAuth 2.0 Token Exchange as Upstream Auth for MCP Backends
(#2052)
CI
- contents:write permission declared on Release workflow's release job
(#2139)
---------
Signed-off-by: achoo30 <achoo30@bloomberg.net>
Signed-off-by: Erica Hughberg <erica.sundberg.90@gmail.com>
Co-authored-by: Erica Hughberg <erica.sundberg.90@gmail.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
**Description**
New Features
Multi-Tenant Hostname Routing
- AIGatewayRoute gains a hostnames field enabling hostname-based model
scoping — serve different model catalogs per tenant from a single
Gateway
- /v1/models automatically scopes its response to models matching the
request's Host header
- Wildcard hostnames (*.ai.example.com) supported via Gateway API
hostname matching rules
Provider Translation
- Anthropic → AWS Bedrock Converse — new translator path lets
Anthropic-native clients reach Bedrock without switching protocols
(text, images, tool use, thinking, streaming)
- Anthropic → OpenAI reasoning & image support — thinking/reasoning
blocks and image content no longer silently dropped during translation
- Claude Opus 4.7 reasoning — display parameter (summarized/omitted),
xhigh effort tier, and claude-mythos-preview model recognition
- Anthropic prefix support — VersionedAPISchema.prefix now works for
Anthropic backends (e.g., /{prefix}/messages)
- anthropic-beta header forwarding — mapped into anthropic_beta body
field for AWSAnthropic backends
OpenAI API Compatibility
- Audio transcription & translation — full data-plane support for
/v1/audio/transcriptions and /v1/audio/translations (Whisper endpoints,
multipart/form-data)
- Azure OpenAI Responses API — /v1/responses routes to Azure's
/openai/responses?api-version=... path
- audio_url and video_url content types — multimodal audio/video inputs
for compatible backends (vLLM, phi-4-mm, Qwen 3.5)
Quota-Aware Routing
- Backend rate limit filter injection for QuotaPolicy — first runtime
enforcement: controller injects a backend rate limit filter when a
QuotaPolicy is attached to an AIServiceBackend
MCP Gateway
- Authorization-filtered tools/list — omits tools the caller isn't
authorized to invoke, preventing tool discovery leaks
Observability
- Smarter log redaction — developer-authored metadata (tool
descriptions, function names, JSON schemas) visible in debug logs; user
content and AI-generated text remain redacted
API Changes
- AIGatewayRoute.spec.hostnames — new optional field for hostname-based
request filtering
- AIGatewayRoute.spec.rules capped at 15 (down from 128) to match
Gateway API HTTPRoute limits
- VersionedAPISchema.prefix extended to Anthropic backends
- QuotaPolicy now has runtime enforcement (backend rate limit filter
injection)
Bug Fixes
- SSE parser handles data:{json} (no space after colon)
- Responses API streaming: buffer incomplete SSE events across response
body chunks
- Responses API: capture token usage from response.incomplete and
response.failed events
- Nil-pointer guard in AWS Bedrock response translator (HTTP 200 with no
output field)
- Comprehensive Gemini finish-reason mapping (previously everything fell
through to content_filter)
- GCP Vertex AI streaming: emit empty delta object instead of omitting
it
- Responses API: handle typeless assistant output messages (e.g., from
OpenCode)
Docs
- Proposal: MCPBackend CRD (envoyproxy#2144)
- Proposal: OAuth 2.0 Token Exchange as Upstream Auth for MCP Backends
(envoyproxy#2052)
CI
- contents:write permission declared on Release workflow's release job
(envoyproxy#2139)
---------
Signed-off-by: achoo30 <achoo30@bloomberg.net>
Signed-off-by: Erica Hughberg <erica.sundberg.90@gmail.com>
Co-authored-by: Erica Hughberg <erica.sundberg.90@gmail.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: saixso <sai.soundararajan@spoton.com>
Description
This PR adds a proposal to implement OAuth 2.0 Token Exchange as Upstream Auth for MCP Backends. The proposal focuses on the APIs to enable token exchange for selected backends.
Related Issues/PRs (if applicable)
Proposal for #2036
Special notes for reviewers (if applicable)
N/A