|
| 1 | +# Aibrix Integration with Envoy AI Gateway Deployment Guide |
| 2 | + |
| 3 | +This guide walks you through deploying a multi-model AI inference gateway using **Envoy AI Gateway**, **Gateway API Inference Extension**, and custom Aibrix-branded routing rules. |
| 4 | + |
| 5 | +### Project Structure |
| 6 | + |
| 7 | +```bash |
| 8 | +samples/ai-gateway-integration |
| 9 | +├── gateway.yaml # GatewayClass + Gateway |
| 10 | +├── aigatewayroute.yaml # Multi-model routing rules (llama2-7b, mistral-7b) |
| 11 | +├── llama-7b-inferencepool.yaml # InferencePool + EPP for Llama2-7B |
| 12 | +├── mistral-7b-inferencepool.yaml # InferencePool + EPP for Mistral-7B |
| 13 | +└── llama-7b.yaml # Mock model llama-7b deployments |
| 14 | +└── mistral-7b.yaml # Mock model mistral-7b deployments |
| 15 | +``` |
| 16 | + |
| 17 | +### Prerequisites |
| 18 | +- Kubernetes cluster (v1.24+) |
| 19 | +- kubectl configured |
| 20 | +- helm v3.8+ |
| 21 | +- Internet access to pull images from docker.io and GitHub |
| 22 | + |
| 23 | +### Installation Steps |
| 24 | + |
| 25 | +1. Install Aibrix Custom Application (Optional) |
| 26 | + If you have an internal Aibrix Helm chart: |
| 27 | + |
| 28 | +If you have an internal Aibrix [Helm chart](../../dist/chart): |
| 29 | +```bash |
| 30 | +helm install aibrix dist/chart -n aibrix-system --create-namespace |
| 31 | +``` |
| 32 | + |
| 33 | +> **Note**: If you are using an internal Aibrix Helm chart, **you must set `gateway.enable: false`** in `values.yaml`. |
| 34 | +> This is critical because **Steps 2–5 below will install the AI Gateway controller and Envoy data plane independently**. |
| 35 | +> Enabling the built-in gateway here would cause resource conflicts or duplicate deployments. |
| 36 | +
|
| 37 | +```yaml |
| 38 | +... |
| 39 | +gateway: |
| 40 | + enable: false # ← Set this to false to skip internal gateway deployment |
| 41 | +... |
| 42 | +``` |
| 43 | + |
| 44 | +2. Install AI Gateway CRDs |
| 45 | + |
| 46 | +```bash |
| 47 | +helm upgrade -i aieg-crd oci://docker.io/envoyproxy/ai-gateway-crds-helm \ |
| 48 | + --version v0.0.0-latest \ |
| 49 | + --namespace envoy-ai-gateway-system \ |
| 50 | + --create-namespace |
| 51 | +``` |
| 52 | + |
| 53 | +> For more details, see the official [installation guide](https://aigateway.envoyproxy.io/docs/getting-started/installation#step-1-install-ai-gateway-crds) for AI Gateway CRDs. |
| 54 | +
|
| 55 | + |
| 56 | +3. Install AI Gateway Controller |
| 57 | + |
| 58 | +```bash |
| 59 | +helm upgrade -i aieg oci://docker.io/envoyproxy/ai-gateway-helm \ |
| 60 | + --version v0.0.0-latest \ |
| 61 | + --namespace envoy-ai-gateway-system \ |
| 62 | + --create-namespace |
| 63 | +``` |
| 64 | + |
| 65 | +> For more details, see the official [installation guide](https://aigateway.envoyproxy.io/docs/getting-started/installation#step-2-install-ai-gateway-resources) for AI Gateway Resources. |
| 66 | +
|
| 67 | +Wait for the controller to be ready: |
| 68 | +```bash |
| 69 | +kubectl wait --timeout=2m -n envoy-ai-gateway-system deployment/ai-gateway-controller --for=condition=Available |
| 70 | +``` |
| 71 | + |
| 72 | +4. Install Gateway API Inference Extension (EPP Framework) |
| 73 | + |
| 74 | +```bash |
| 75 | +kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v1.0.1/manifests.yaml |
| 76 | +``` |
| 77 | + |
| 78 | +> For more details, see the official [installation guide](https://aigateway.envoyproxy.io/docs/capabilities/inference/httproute-inferencepool#step-1-install-gateway-api-inference-extension) for Gateway API Inference Extension. |
| 79 | +
|
| 80 | + |
| 81 | +This deploys: |
| 82 | +CRDs (InferencePool, InferenceObjective) |
| 83 | +RBAC, webhooks, and core controllers |
| 84 | + |
| 85 | +5. Install Envoy Gateway (Data Plane) |
| 86 | + |
| 87 | +```bash |
| 88 | +helm upgrade -i eg oci://docker.io/envoyproxy/gateway-helm \ |
| 89 | + --version v0.0.0-latest \ |
| 90 | + --namespace envoy-gateway-system \ |
| 91 | + --create-namespace \ |
| 92 | + -f https://raw.githubusercontent.com/envoyproxy/ai-gateway/main/manifests/envoy-gateway-values.yaml \ |
| 93 | + -f https://raw.githubusercontent.com/envoyproxy/ai-gateway/main/examples/inference-pool/envoy-gateway-values-addon.yaml |
| 94 | +``` |
| 95 | + |
| 96 | +> For more details, see the official [installation guide](https://aigateway.envoyproxy.io/docs/getting-started/prerequisites#additional-features-rate-limiting-inferencepool-etc) for Envoy Gateway. |
| 97 | +
|
| 98 | + |
| 99 | +6. Deploy Aibrix AI Gateway Resources |
| 100 | + |
| 101 | +Apply your custom gateway and routing configuration: |
| 102 | + |
| 103 | +```bash |
| 104 | +cd samples/ai-gateway-integration |
| 105 | + |
| 106 | +# Deploy for each model |
| 107 | +kubectl apply -f llama-7b.yaml |
| 108 | +kubectl apply -f mistral-7b.yaml |
| 109 | + |
| 110 | +# Deploy GatewayClass, Gateway, and AIGatewayRoute |
| 111 | +kubectl apply -f gateway.yaml |
| 112 | +kubectl apply -f aigatewayroute.yaml |
| 113 | + |
| 114 | +# Deploy backend resources for each model |
| 115 | +kubectl apply -f llama-7b-inferencepool.yaml |
| 116 | +kubectl apply -f mistral-7b-inferencepool.yaml |
| 117 | +``` |
| 118 | + |
| 119 | +### Verify Deployment Status |
| 120 | + |
| 121 | +After installation, you can verify that all components are running correctly. Below is an example of expected output from a successful deployment: |
| 122 | + |
| 123 | +- Pods in `aibrix-system` |
| 124 | +```bash |
| 125 | +$ kubectl get pods -n aibrix-system |
| 126 | +NAME READY STATUS RESTARTS AGE |
| 127 | +aibrix-controller-manager-7dcf4b8d97-9mgw8 1/1 Running 0 3h35m |
| 128 | +aibrix-gpu-optimizer-556d946fbb-gzh85 1/1 Running 0 3h35m |
| 129 | +aibrix-metadata-service-bdfd4459d-678k5 1/1 Running 0 3h35m |
| 130 | +aibrix-redis-master-74945dc65d-sr2sq 1/1 Running 0 3h35m |
| 131 | +``` |
| 132 | + |
| 133 | +- Pods in `envoy-ai-gateway-system` |
| 134 | +```bash |
| 135 | +$ kubectl get pods -n envoy-ai-gateway-system |
| 136 | +NAME READY STATUS RESTARTS AGE |
| 137 | +ai-gateway-controller-5558c7cf7c-bzh65 1/1 Running 0 3h34m |
| 138 | +``` |
| 139 | + |
| 140 | +- Pods in `envoy-gateway-system |
| 141 | +```bash |
| 142 | +$ kubectl get pods -n envoy-gateway-system |
| 143 | +NAME READY STATUS RESTARTS AGE |
| 144 | +envoy-default-aibrix-ai-gateway-588291e8-54d5f9b6f-2psp6 3/3 Running 0 128m |
| 145 | +envoy-gateway-6dd8f9b8f-kjngn 1/1 Running 0 3h33m |
| 146 | +``` |
| 147 | + |
| 148 | +- AI Gateway CRDs |
| 149 | +```bash |
| 150 | +$ kubectl get InferencePool |
| 151 | +NAME AGE |
| 152 | +llama2-7b 121m |
| 153 | +mistral-7b 121m |
| 154 | + |
| 155 | +$ kubectl get InferenceObjective |
| 156 | +NAME INFERENCE POOL PRIORITY AGE |
| 157 | +llama2-7b llama2-7b 10 121m |
| 158 | +mistral-7b mistral-7b 10 121m |
| 159 | +``` |
| 160 | + |
| 161 | +- Model and EPP Backend Pods (in default namespace) |
| 162 | + |
| 163 | +```bash |
| 164 | +$ kubectl get pods |
| 165 | +NAME READY STATUS RESTARTS AGE |
| 166 | +llama2-7b-epp-6fb99fd7df-7xlxq 1/1 Running 0 121m |
| 167 | +mistral-7b-epp-7c7f7fcb66-bw87d 1/1 Running 0 121m |
| 168 | +mock-llama2-7b-6444f9b459-7gzmx 1/1 Running 0 131m |
| 169 | +mock-llama2-7b-6444f9b459-92bsl 1/1 Running 0 131m |
| 170 | +mock-llama2-7b-6444f9b459-krj8c 1/1 Running 0 131m |
| 171 | +mock-mistral-7b-5fddcff595-5268f 1/1 Running 0 131m |
| 172 | +mock-mistral-7b-5fddcff595-t65cp 1/1 Running 0 131m |
| 173 | +``` |
| 174 | + |
| 175 | +### Test the Setup |
| 176 | + |
| 177 | +Once all pods are ready, test routing via curl: |
| 178 | + |
| 179 | +- Llama2-7B |
| 180 | + |
| 181 | +```bash |
| 182 | +curl -v http://<GATEWAY_IP>/v1/chat/completions \ |
| 183 | + -H "Content-Type: application/json" \ |
| 184 | + -H "x-ai-eg-model: llama2-7b" \ |
| 185 | + -H "Authorization: Bearer test-key-1234567890" \ |
| 186 | + -d '{ |
| 187 | + "model": "llama2-7b", |
| 188 | + "messages": [{"role": "user", "content": "Say this is a test!"}], |
| 189 | + "temperature": 0.7 |
| 190 | + }' |
| 191 | +``` |
| 192 | + |
| 193 | +- Mistral-7B |
| 194 | + |
| 195 | +```bash |
| 196 | +curl -v http://<GATEWAY_IP>/v1/chat/completions \ |
| 197 | + -H "Content-Type: application/json" \ |
| 198 | + -H "x-ai-eg-model: mistral-7b" \ |
| 199 | + -H "Authorization: Bearer test-key-0987654321" \ |
| 200 | + -d '{ |
| 201 | + "model": "mistral-7b", |
| 202 | + "messages": [{"role": "user", "content": "Say this is a test!"}], |
| 203 | + "temperature": 0.7 |
| 204 | + }' |
| 205 | +``` |
| 206 | + |
| 207 | +Replace `<GATEWAY_IP>` with: |
| 208 | +- localhost:8080 if using |
| 209 | + |
| 210 | +```bash |
| 211 | +kubectl port-forward -n envoy-gateway-system svc/eg-envoy 8080:80 |
| 212 | +``` |
| 213 | + |
| 214 | +Or the external IP of the `eg-envoy` Service if exposed via LoadBalancer. |
| 215 | + |
| 216 | +### References |
| 217 | +- [Envoy AI Gateway](https://github.com/envoyproxy/ai-gateway) |
| 218 | +- [Gateway API Inference Extension](https://github.com/kubernetes-sigs/gateway-api-inference-extension) |
0 commit comments