Skip to content

Race Condition During Deployment: extProc sidecar not injected #1495

@nileger

Description

@nileger

Description:

We experience an interesting race condition during initial deployment of the envoy-ai-gateway Helm chart and a Gateway resource.

We have an umbrella chart with which we install the envoy-ai-gateway Helm chart as dependency, and the Gateway resource as a template of our Helm chart.

Most of the time (approximately 90%), the extProc sidecar won't be injected into the AI Gateway pod after the initial installation of the Helm chart. Waiting for a while (approximately 30 minutes), then ultimately resolved the issue and the extProc sidecar is injected.

However, waiting for 30 minutes is, of course, not acceptable from a user's perspective.

Manually deleting the AI Gateway Pod also resolves the issue. The extProc side car is then injected after the deletion of the pod and we see the following logs, indicating a successful injection.

found routes for gateway
mutating gateway pod

Initially, we thought that the issue was related to the envoy-ai-gateway Helm chart being installed before the AIGatewayRoutes. However, we don't see any of the logs which would prove that during the initial installation (from gateway_mutator.gos method mutatePod).

failed to list routes
AIGatewayRoutes or MCPRoutes found for gateway
failed to get filter config secret

None of those are shown.

We tried adding an init container to the AI Gateway waiting for the AI Gateway Controller to be up and running, but that doesn't help either.

Initially after the deployment, we see several error messages like

ERROR	controller.gateway	failed to get backend auth from backend security policy. Skipping this backend

from the method reconcileFilterConfigSecret´ (gateway.go`).

But immediately afterwards, this "error" is resolved

INFO	controller.backend-security-policy.azure-token-rotator	creating a new azure access token into secret
INFO	controller.backend-security-policy	successfully rotated credential for ...
INFO	controller.backend-security-policy	Syncing AIServiceBackend ...
INFO	controller.secret	Reconciling Secret ...
INFO	envoy-gateway-extension-server	inserting AI Gateway extproc filter into listener

But although it says inserting AI Gateway extproc filter, the extProc sidecar isn't injected.

Do you have any idea what could be causing this interesting and inconsistent behavior and how we could resolve it? To me, this seems to be a bug.

Let us know if we should run any further tests or send you more logs.

Repro steps:

  1. Create a new Helm chart
  2. Add envoy-ai-gateway and gateway-helm as dependencies
  3. Configure both to only watch their own namespace and configure Envoy Gateway to be deployed in namespace mode
  4. Add a few AI Backend Routes and a Gateway
  5. Install and uninstall a few times to observe that the extProc sidecar is injected sometimes, and sometimes it isn't

Environment:

gateway-helm: 1.5.1
ai-gateway-helm: 0.0.0-2fa6b27fe9bfdb0be4f108f9dec595735050cd1d (latest dev as of yesterday)

Logs:
See above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions