Skip to content

fix(controller): use noCacheReader in webhook to avoid extProc injection cache race#1981

Open
KaveeshKhattar wants to merge 1 commit intoenvoyproxy:mainfrom
KaveeshKhattar:fix/admission-cache-race-extproc-injection
Open

fix(controller): use noCacheReader in webhook to avoid extProc injection cache race#1981
KaveeshKhattar wants to merge 1 commit intoenvoyproxy:mainfrom
KaveeshKhattar:fix/admission-cache-race-extproc-injection

Conversation

@KaveeshKhattar
Copy link
Contributor

Description

Adds a noCacheReader client.Reader field to gatewayMutator wired from mgr.GetAPIReader() at construction time.

Route lookups are extracted into two helper methods - listAIGatewayRoutesForGateway and listMCPRoutesForGateway,
Extract route lookups into two helper methods:

  • listAIGatewayRoutesForGateway
  • listMCPRoutesForGateway

Each following a cache-first, fallback-second pattern, trying the cached client with MatchingFields index lookup first; if the cache returns empty, fall back to noCacheReader.

The no-cache path cannot use MatchingFields since it has no access to in-memory indexes these exist only in the controller's cache, not on the API server.

Manual filtering via parentRefsMatchGateway replicates the same namespace resolution logic as the index functions. A comment marks this coupling explicitly so future changes to index logic are not missed.

Related Issues/PRs (if applicable)

Fixes #1495
Related PR: #1789

Validation

3 tests added to gateway_mutator_test.go, each using two separate fake client instances.
One empty (simulating stale cache) and one populated (simulating direct API server) to exercise the fallback path:

  • TestGatewayMutator_mutatePod_UsesNoCacheReader: route exists only in noCacheReader, verifies sidecar is still injected
  • TestGatewayMutator_listAIGatewayRoutesForGateway_NoCacheReaderFallback: verifies fallback filtering returns only matching routes
  • TestGatewayMutator_listMCPRoutesForGateway_NoCacheReaderFallback: same for MCP routes

make precommit test passed locally.

Additional Context

The Gateway controller has a corrective rollout mechanism where if it detects pods without the sidecar while effective routes exist, it triggers a rolling update to re-invoke the webhook.
However, this mechanism also uses the cached client for route lookups and is subject to the same cache race. This fix addresses the root cause directly at the webhook level, making the corrective rollout unnecessary for this scenario.

…ion cache race

Signed-off-by: Kaveesh Khattar <kaveeshkhattar@gmail.com>
@KaveeshKhattar KaveeshKhattar requested a review from a team as a code owner March 23, 2026 07:24
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Mar 23, 2026
@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 81.13208% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.33%. Comparing base (98200a4) to head (074a7f0).

Files with missing lines Patch % Lines
internal/controller/gateway_mutator.go 81.13% 5 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1981      +/-   ##
==========================================
- Coverage   84.33%   84.33%   -0.01%     
==========================================
  Files         130      130              
  Lines       17987    18032      +45     
==========================================
+ Hits        15170    15207      +37     
- Misses       1873     1877       +4     
- Partials      944      948       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@johnugeorge
Copy link
Contributor

Were you able to reproduce this bug? Can you provide the steps?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Race Condition During Deployment: extProc sidecar not injected

3 participants