Skip to content

docs: add QuotaPolicy capability guide#1947

Open
missBerg wants to merge 5 commits intoenvoyproxy:mainfrom
missBerg:docs/quota-policy
Open

docs: add QuotaPolicy capability guide#1947
missBerg wants to merge 5 commits intoenvoyproxy:mainfrom
missBerg:docs/quota-policy

Conversation

@missBerg
Copy link
Contributor

Description

Adds comprehensive documentation for the QuotaPolicy CRD, which was fully implemented
but had no user-facing documentation. Covers service-wide quotas, per-model quotas,
CEL cost expressions, Exclusive vs Shared bucket modes, client-selector based bucket rules,
shadow mode, and duration format. Follows the structure of existing traffic capability guides.

Distinguishes QuotaPolicy (total consumption budgets) from rate limiting (request velocity)
to help adopters choose the right mechanism.

Related Issues/PRs (if applicable)

Related #1571

Special notes for reviewers (if applicable)

All field definitions and semantics sourced from api/v1alpha1/quota_policy.go GoDoc comments.
Includes complete, copy-pasteable YAML examples for every configuration option.

@missBerg missBerg requested a review from a team as a code owner March 14, 2026 21:09
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Mar 14, 2026
@dosubot
Copy link

dosubot bot commented Mar 14, 2026

Related Documentation

9 document(s) may need updating based on files changed in this PR:

Envoy's Space

index /ai-gateway/blob/main/site/versioned_docs/version-0.3/capabilities/index.md
View Suggested Changes
@@ -31,6 +31,7 @@
 
 - **[Model Virtualization](./traffic/model-virtualization.md)**: Abstract and virtualize AI models
 - **[Provider Fallback](./traffic/provider-fallback.md)**: Automatic failover between AI providers
+- **[Quota Policy](./traffic/quota-policy.md)**: Manage total token consumption budgets across time windows
 - **[Usage-based Rate Limiting](./traffic/usage-based-ratelimiting.md)**: Token-aware rate limiting for AI workloads
 
 ## Security

[Accept] [Decline]

index /ai-gateway/blob/main/site/versioned_docs/version-0.4/capabilities/traffic/index.md
View Suggested Changes
@@ -3,4 +3,4 @@
 title: Traffic Handling
 ---
 
-This section provides information about traffic routing related capabilities in Envoy AI Gateway.
+This section provides information about traffic routing related capabilities in Envoy AI Gateway, including quota management for controlling total token consumption budgets across time windows.

[Accept] [Decline]

index /ai-gateway/blob/main/site/docs/capabilities/index.md
View Suggested Changes
@@ -35,6 +35,7 @@
 
 - **[Model Virtualization](./traffic/model-virtualization.md)**: Abstract and virtualize AI models
 - **[Provider Fallback](./traffic/provider-fallback.md)**: Automatic failover between AI providers
+- **[Quota Policy](./traffic/quota-policy.md)**: Manage total token consumption budgets with time-based limits
 - **[Usage-based Rate Limiting](./traffic/usage-based-ratelimiting.md)**: Token-aware rate limiting for AI workloads
 
 ## Security

[Accept] [Decline]

index /ai-gateway/blob/main/site/docs/capabilities/traffic/index.md
View Suggested Changes
@@ -3,4 +3,4 @@
 title: Traffic Handling
 ---
 
-This section provides information about traffic routing related capabilities in Envoy AI Gateway.
+This section provides information about traffic routing related capabilities in Envoy AI Gateway, including quota management for controlling total token consumption budgets across time windows.

[Accept] [Decline]

index /ai-gateway/blob/main/site/versioned_docs/version-0.5/capabilities/index.md
View Suggested Changes
@@ -35,6 +35,7 @@
 
 - **[Model Virtualization](./traffic/model-virtualization.md)**: Abstract and virtualize AI models
 - **[Provider Fallback](./traffic/provider-fallback.md)**: Automatic failover between AI providers
+- **[Quota Policy](./traffic/quota-policy.md)**: Manage total token consumption budgets with caps on cumulative token spend across time windows
 - **[Usage-based Rate Limiting](./traffic/usage-based-ratelimiting.md)**: Token-aware rate limiting for AI workloads
 
 ## Security

[Accept] [Decline]

index /ai-gateway/blob/main/site/versioned_docs/version-0.3/capabilities/traffic/index.md
View Suggested Changes
@@ -3,4 +3,4 @@
 title: Traffic Handling
 ---
 
-This section provides information about traffic routing related capabilities in Envoy AI Gateway.
+This section provides information about traffic routing related capabilities in Envoy AI Gateway, including quota management for controlling total token consumption budgets across time windows.

[Accept] [Decline]

index /ai-gateway/blob/main/site/versioned_docs/version-0.5/capabilities/traffic/index.md
View Suggested Changes
@@ -3,4 +3,6 @@
 title: Traffic Handling
 ---
 
-This section provides information about traffic routing related capabilities in Envoy AI Gateway.
+This section provides information about traffic routing related capabilities in Envoy AI Gateway, including:
+
+- **Quota Policy** -- token-based quota management for controlling total consumption budgets across time windows

[Accept] [Decline]

index /ai-gateway/blob/main/site/versioned_docs/version-0.4/capabilities/index.md
View Suggested Changes
@@ -31,6 +31,7 @@
 
 - **[Model Virtualization](./traffic/model-virtualization.md)**: Abstract and virtualize AI models
 - **[Provider Fallback](./traffic/provider-fallback.md)**: Automatic failover between AI providers
+- **[Quota Policy](./traffic/quota-policy.md)**: Manage total token consumption budgets with caps on cumulative token spend across time windows
 - **[Usage-based Rate Limiting](./traffic/usage-based-ratelimiting.md)**: Token-aware rate limiting for AI workloads
 
 ## Security

[Accept] [Decline]

usage-based-ratelimiting /ai-gateway/blob/main/site/docs/capabilities/traffic/usage-based-ratelimiting.md
View Suggested Changes
@@ -8,6 +8,17 @@
 import TabItem from '@theme/TabItem';
 
 This guide focuses on AI Gateway's specific capabilities for token-based rate limiting in LLM requests. For general rate limiting concepts and configurations, refer to [Envoy Gateway's Rate Limiting documentation](https://gateway.envoyproxy.io/docs/tasks/traffic/global-rate-limit/).
+
+:::info Choosing Between QuotaPolicy and Usage-Based Rate Limiting
+
+AI Gateway provides two complementary mechanisms for token control:
+
+- **[QuotaPolicy](./quota-policy.md)**: Manages **total consumption budgets** (e.g., "allow 100,000 tokens per hour"). Use this when you need to cap cumulative token spend across a time window to prevent cost overruns.
+
+- **Usage-Based Rate Limiting** (this page): Controls **request velocity** (e.g., "allow X requests per second"). Use this when you need to throttle the rate at which requests are processed.
+
+These capabilities can be used together to provide comprehensive cost and traffic control.
+:::
 
 ## Overview
 

[Accept] [Decline]

Note: You must be authenticated to accept/decline updates.

How did I do? Any feedback?  Join Discord

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Erica Hughberg <erica.sundberg.90@gmail.com>
@missBerg missBerg force-pushed the docs/quota-policy branch from 39cd2e2 to d717bc6 Compare March 14, 2026 21:22
missBerg and others added 4 commits March 14, 2026 17:26
Add QuotaPolicy to capabilities index and cross-reference from the
usage-based rate limiting page to help users discover and distinguish
the two token management mechanisms.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Erica Hughberg <erica.sundberg.90@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Erica Hughberg <erica.sundberg.90@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants