feat: rewarm cache from planner cache on schema change#2445
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a ReloadPersistentState subsystem with an in-memory plan-cache fallback, exposes CDN and in-memory-fallback cache-warmup sources, threads reload-persistent state through router/supervisor/graph construction, extends the operation planner to optionally store operation content, and adds extensive tests for cache warmup and reload behavior. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Router image scan passed✅ No security vulnerabilities found in image: |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2445 +/- ##
===========================================
+ Coverage 43.25% 61.76% +18.50%
===========================================
Files 1015 231 -784
Lines 142011 24123 -117888
Branches 8900 0 -8900
===========================================
- Hits 61424 14899 -46525
+ Misses 78978 7979 -70999
+ Partials 1609 1245 -364
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
router/go.mod (2)
175-199: Fix inconsistent ristretto/v2 fork across multi-module repoThe router module pins a fork (
github.com/wundergraph/ristretto/v2 v2.0.0-20260113151003-f7231509ad4d), but router-tests, demo, and graphqlmetrics modules all require the upstream version (v2.1.0). This creates mixed implementations at build time. The replace directive in router/go.mod only affects that module's dependencies, not the workspace. Either apply the same replace directive to other go.mod files that depend on ristretto, or unify the version across all modules.
59-88: Add explanatory comment about the Ristretto fork in go.modThe fork
github.com/wundergraph/ristretto/v2at commit f7231509ad4d (branchmilinda/iterating_elements) adds iterator support for cache elements, which is not available in the upstream v2.0.0 release. Add a comment in the replace directive explaining this purpose. Since no other modules in the repo depend on ristretto v2.1.x features (the require block specifies only v2.0.0), there are no compatibility concerns with using the fork.
🤖 Fix all issues with AI agents
In `@router-tests/cache_warmup_test.go`:
- Around line 1100-1106: The test uses require.EventuallyWithT with an
assert.CollectT callback but calls require.Equal inside the retry closure (in
the block passed to require.EventuallyWithT) which will abort immediately;
replace the failing assertions inside that callback with assert.Equal so the
checks are retried: locate the closure passed to require.EventuallyWithT (where
res = xEnv.MakeGraphQLRequestOK(...)) and change require.Equal(t, "updated",
res.Response.Header.Get("X-Router-Config-Version")) and require.Equal(t, "HIT",
res.Response.Header.Get("x-wg-execution-plan-cache")) to assert.Equal(t,
"updated", ...) and assert.Equal(t, "HIT", ...).
In `@router/core/http_server.go`:
- Line 39: Remove the dead field from the httpServerOptions struct:
`switchoverConfig` is never set when creating `httpServerOptions` and is not
used by `newServer()`, since `SwitchoverConfig` is handled at the `Router` level
and passed into `newGraphServer`; delete the `switchoverConfig
*SwitchoverConfig` field from the `httpServerOptions` definition and update any
references (none expected) to rely on the Router/newGraphServer flow instead.
In `@router/core/restart_switchover_config.go`:
- Around line 24-38: UpdateSwitchoverConfig writes
s.inMemorySwitchOverCache.enabled without synchronization which races with reads
in getPlanCacheForFF, setPlanCacheForFF, and cleanupUnusedFeatureFlags; fix by
acquiring the SwitchoverConfig write lock (e.g., s.mu or the existing cache
lock) around the block that sets s.inMemorySwitchOverCache.enabled and mutates
queriesForFeatureFlag/testValue, and change the reader methods
getPlanCacheForFF, setPlanCacheForFF, and cleanupUnusedFeatureFlags to check
enabled while holding the same lock (or alternatively convert enabled to an
atomic.Bool and use atomic load/store for reads/writes) so readers and writers
are consistent.
In `@router/core/router.go`:
- Around line 1177-1181: Fix the typo in the inline comment near the switchover
config initialization: change "sueprvisor" to "supervisor" in the comment that
precedes the r.switchoverConfig = NewSwitchoverConfig() block so the comment
reads correctly while leaving the surrounding logic (r.switchoverConfig,
NewSwitchoverConfig(), UpdateSwitchoverConfig(&r.Config)) unchanged.
🧹 Nitpick comments (4)
router/pkg/config/config.go (1)
974-986: Consider makingCacheWarmupSource.InMemorySwitchovera pointer to preserve “unset vs disabled”
At Line 974-977,InMemorySwitchoveris a non-pointer struct withomitempty. Depending on the marshaler, this may still serialize as present (and it definitely can’t represent “not provided” distinctly from “provided but disabled”), which tends to leak into golden testdata and example configs (as seen inconfig_defaults.json).If you want
cache_warmup.sourceto be truly optional/empty unless explicitly configured, consider changing it to a pointer like*CacheWarmupInMemorySwitchover.Proposed change
type CacheWarmupSource struct { Filesystem *CacheWarmupFileSystemSource `yaml:"filesystem,omitempty"` - InMemorySwitchover CacheWarmupInMemorySwitchover `yaml:"in_memory_switchover,omitempty"` + InMemorySwitchover *CacheWarmupInMemorySwitchover `yaml:"in_memory_switchover,omitempty"` }router/pkg/config/config.schema.json (1)
2387-2423: Schema polish: adddefault: false+ tighten wording forin_memory_switchover.enabled
For the new block at Line 2404-2413:
enabledhas no default in the schema (even though the Go config usesenvDefault:"false"). Consider adding"default": falsefor consistency with other booleans.- Minor: description “Is in memory switchover enabled” could be normalized (and maybe mention it reuses prior process’ cached plans, if that’s the intent).
router/core/cache_warmup_plans.go (1)
16-30: Consider preserving additional operation metadata for complete cache warmup.The current implementation only extracts the
Querycontent from cached plans. For accurate cache warmup, operations may also need variables, operation name, and headers that were part of the original request. IfplanWithMetaDatastores additional context, consider extracting it here.Also, based on the past discussion about limiting cache warmer operations to ~100, this implementation iterates unboundedly. Consider adding a limit if the plan cache can grow large.
router/core/restart_switchover_config.go (1)
44-49: Consider documenting or removing thetestValuefield.The
testValuefield appears to be for testing purposes but is not documented. If it's only used in tests, consider adding a comment explaining its purpose, or moving test-specific logic elsewhere.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
router/go.sumis excluded by!**/*.sum
📒 Files selected for processing (16)
router-tests/cache_warmup_test.gorouter/cmd/main.gorouter/core/cache_warmup_plans.gorouter/core/graph_server.gorouter/core/http_server.gorouter/core/operation_planner.gorouter/core/restart_switchover_config.gorouter/core/router.gorouter/core/router_config.gorouter/core/supervisor.gorouter/core/supervisor_instance.gorouter/go.modrouter/pkg/config/config.gorouter/pkg/config/config.schema.jsonrouter/pkg/config/testdata/config_defaults.jsonrouter/pkg/config/testdata/config_full.json
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.
Applied to files:
router/core/supervisor.gorouter/cmd/main.gorouter/core/router.gorouter-tests/cache_warmup_test.gorouter/core/restart_switchover_config.gorouter/core/supervisor_instance.gorouter/core/graph_server.go
📚 Learning: 2026-01-06T12:37:21.521Z
Learnt from: asoorm
Repo: wundergraph/cosmo PR: 2379
File: router/pkg/connectrpc/operation_registry_test.go:381-399
Timestamp: 2026-01-06T12:37:21.521Z
Learning: In Go code (Go 1.25+), prefer using sync.WaitGroup.Go(func()) to run a function in a new goroutine, letting the WaitGroup manage Add/Done automatically. Avoid manual wg.Add(1) followed by go func() { defer wg.Done(); ... }() patterns. Apply this guidance across all Go files in the wundergraph/cosmo repository where concurrency is used.
Applied to files:
router/core/supervisor.gorouter/core/http_server.gorouter/core/router_config.gorouter/core/cache_warmup_plans.gorouter/pkg/config/config.gorouter/cmd/main.gorouter/core/router.gorouter-tests/cache_warmup_test.gorouter/core/restart_switchover_config.gorouter/core/supervisor_instance.gorouter/core/operation_planner.gorouter/core/graph_server.go
📚 Learning: 2025-08-12T13:50:45.964Z
Learnt from: Noroth
Repo: wundergraph/cosmo PR: 2132
File: router-plugin/plugin.go:139-146
Timestamp: 2025-08-12T13:50:45.964Z
Learning: In the Cosmo router plugin system, the plugin framework creates its own logger independently. When creating a logger in NewRouterPlugin, it's only needed for the gRPC server setup (passed via setup.GrpcServerInitOpts.Logger) and doesn't need to be assigned back to serveConfig.Logger.
Applied to files:
router/cmd/main.go
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.
Applied to files:
router-tests/cache_warmup_test.go
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.
Applied to files:
router-tests/cache_warmup_test.gorouter/go.mod
📚 Learning: 2025-08-26T17:19:28.178Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2167
File: router/internal/expr/retry_context.go:3-10
Timestamp: 2025-08-26T17:19:28.178Z
Learning: In the Cosmo router codebase, prefer using netErr.Timeout() checks over explicit context.DeadlineExceeded checks for timeout detection in retry logic, as Go's networking stack typically wraps context deadline exceeded errors in proper net.Error implementations.
Applied to files:
router-tests/cache_warmup_test.go
📚 Learning: 2025-08-28T09:18:10.121Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:100-108
Timestamp: 2025-08-28T09:18:10.121Z
Learning: In router-tests/http_subscriptions_test.go heartbeat tests, the message ordering should remain strict with data messages followed by heartbeat messages, as the timing is deterministic and known by design in the Cosmo router implementation.
Applied to files:
router-tests/cache_warmup_test.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.
Applied to files:
router/core/graph_server.go
📚 Learning: 2025-11-12T18:38:46.619Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2327
File: graphqlmetrics/core/metrics_service.go:435-442
Timestamp: 2025-11-12T18:38:46.619Z
Learning: In graphqlmetrics/core/metrics_service.go, the opGuardCache TTL (30 days) is intentionally shorter than the gql_metrics_operations database retention period (90 days). This design reduces memory usage while relying on database constraints to prevent duplicate inserts after cache expiration.
Applied to files:
router/core/graph_server.go
🧬 Code graph analysis (6)
router/core/supervisor.go (2)
router/core/restart_switchover_config.go (1)
SwitchoverConfig(14-16)router/core/router.go (1)
Router(84-94)
router/core/http_server.go (1)
router/core/restart_switchover_config.go (1)
SwitchoverConfig(14-16)
router/cmd/main.go (1)
router/core/restart_switchover_config.go (2)
SwitchoverConfig(14-16)NewSwitchoverConfig(18-22)
router/core/router.go (1)
router/core/restart_switchover_config.go (2)
SwitchoverConfig(14-16)NewSwitchoverConfig(18-22)
router/core/supervisor_instance.go (3)
router/core/restart_switchover_config.go (1)
SwitchoverConfig(14-16)router/core/router_config.go (1)
Config(49-144)router/core/router.go (2)
Option(174-174)WithSwitchoverConfig(2057-2061)
router/core/graph_server.go (4)
router/core/restart_switchover_config.go (1)
SwitchoverConfig(14-16)router/core/router.go (1)
Router(84-94)router/core/operation_planner.go (1)
NewOperationPlanner(50-57)router/core/cache_warmup_plans.go (1)
NewPlanSource(16-30)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: build-router
- GitHub Check: integration_test (./telemetry)
- GitHub Check: build_push_image (nonroot)
- GitHub Check: build_push_image
- GitHub Check: integration_test (./events)
- GitHub Check: build_test
- GitHub Check: image_scan (nonroot)
- GitHub Check: image_scan
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: build_test
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (36)
router/cmd/main.go (1)
152-172: SwitchoverConfig wiring looks correct; ensure lifecycle matches intended cache persistence scope
Line 152-155 passes a singlecore.NewSwitchoverConfig()into the supervisor—so caches/state will live for the process lifetime across reloads (likely desired for “reprocess existing plans”). Just ensure this is the intended scope (vs “per reload”) and that the underlying cache is concurrency-safe.router/pkg/config/testdata/config_defaults.json (1)
311-322: config_defaults.json is a golden test output file, not a configuration subject to schema validation—no changes neededThe concern about schema mismatch is based on a misunderstanding of the file's purpose.
config_defaults.jsonis purely a golden test artifact inTestDefaults()used to verify the JSON serialization of the Config struct; it is never loaded from disk or validated againstconfig.schema.json.The schema validation in
LoadConfig()applies only to YAML input files. When the Config struct is marshaled to JSON for the golden test, Go's default JSON marshaling uses PascalCase field names (Filesystem,InMemorySwitchover) and includes null values where structs haveomitemptyYAML tags. This serialization format is independent of the schema, which constrains YAML input with snake_case keys. The presence of"Filesystem": nullin the output is expected and correct.router/pkg/config/testdata/config_full.json (1)
660-671: LGTM!The
InMemorySwitchoverconfiguration block is correctly added underCacheWarmup.Source, following the existing pattern withFilesystem. The defaultEnabled: falseis appropriate for this comprehensive config test file.router/core/supervisor.go (3)
32-36: LGTM!The
SwitchoverConfigfield is correctly added toRouterResources, enabling the switchover configuration to be passed through to downstream components like the Router.
38-44: LGTM!The
SwitchoverConfigfield inRouterSupervisorOptsprovides the input path for the switchover configuration.
52-55: LGTM!The initialization correctly propagates
SwitchoverConfigfrom options to resources, completing the wiring through the supervisor layer.router-tests/cache_warmup_test.go (5)
5-6: LGTM!The new imports are appropriate for the added test functionality:
assertforEventuallyWithTcallbacks andconfigpollerfor the mock configuration poller.
919-971: LGTM - well-structured test for in-memory switchover caching.The test correctly validates that:
- Initial request results in a cache MISS
- After config update (simulating switchover), the same query results in a cache HIT
This confirms the in-memory plan cache is preserved across config reloads.
973-1014: LGTM!Correctly tests that cache is not preserved when cache warmup is entirely disabled, resulting in MISS on both requests.
1016-1062: LGTM!Correctly tests that the default cache warmer (with
InMemorySwitchover.Enabled = false) does not preserve plans across config reloads.
1064-1108: BothwriteTestConfig(defined in router-tests/config_hot_reload_test.go:579) andConfigPollerMock(defined in router-tests/config_hot_reload_test.go:32) are properly defined in the test package and are accessible to this test file.router/core/router_config.go (1)
314-321: LGTM - source classification logic is correct and type-safe.The addition correctly classifies the cache warmup source. The control flow properly prioritizes
FilesystemoverInMemorySwitchover. SinceInMemorySwitchoveris a value type (not a pointer), accessing itsEnabledfield is safe—no nil check is needed.router/core/cache_warmup_plans.go (1)
32-34: LGTM!Simple and efficient - queries are extracted once during construction, allowing the source cache to be garbage collected.
router/core/supervisor_instance.go (3)
53-53: LGTM!Correctly threads
SwitchoverConfigthrough the router option construction.
184-184: LGTM!Function signature updated to accept
switchoverConfigparameter for propagation.
275-275: LGTM!
WithSwitchoverConfigoption correctly added to the options list.router/core/router.go (4)
93-93: LGTM!Field addition to hold switchover configuration state.
600-600: LGTM!Correctly passes
switchoverConfigtonewGraphServerfor plan cache management.
608-610: LGTM!Feature flag cleanup after server swap ensures unused caches are pruned.
2057-2061: LGTM!Standard option pattern for setting switchover configuration.
router/core/restart_switchover_config.go (2)
71-78: Clarify behavior whenFeatureFlagConfigsis nil.When
routerCfg.FeatureFlagConfigsis nil, the method returns early without cleaning up any cached feature flags. This means stale feature flag caches could persist. Is this intentional?If feature flags can be completely removed from a config, consider clearing all non-base caches when
FeatureFlagConfigsis nil.
83-96: LGTM!The cleanup logic correctly preserves the base configuration (empty string key) while removing caches for feature flags that no longer exist in the router config.
router/core/operation_planner.go (8)
24-24: LGTM!New
contentfield enables storing the original query for cache warmup scenarios.
32-37: LGTM!The
storeContentflag andoperationPlannerOptsstruct provide clean control over content storage behavior.
50-56: LGTM!Constructor correctly accepts and stores the
storeContentflag.
59-59: LGTM!Signature update to accept
operationPlannerOptsfor controlling plan preparation behavior.
93-95: LGTM!Content is conditionally stored based on the
storeContentoption, avoiding unnecessary memory usage when content storage is not needed.
122-122: LGTM!When skipping cache (for tracing or query plan inclusion), content storage is disabled since these plans are not cached.
150-150: LGTM!Normal cache path correctly uses the planner's
storeContentsetting.
44-46: TheItermethod is already implemented byristretto.Cache, which is the onlyExecutionPlanCacheimplementation used in the codebase. It's already being called inrouter/core/cache_warmup_plans.go:21without issues. AddingIterto the interface is not a breaking change.Likely an incorrect or invalid review comment.
router/core/graph_server.go (6)
108-116: LGTM!The addition of
SwitchoverConfigtoBuildGraphMuxOptionsfollows the existing pattern for optional configuration parameters.
122-123: LGTM!The signature extension to accept
switchoverConfigis correctly propagated through the initialization path.
434-458: LGTM!The
switchoverConfigis correctly threaded through to feature flag mux construction, ensuring consistent behavior across base and feature flag graphs.
526-532: LGTM!Field alignment changes are cosmetic and improve readability.
1303-1304: No nil pointer dereference risk. SwitchoverConfig and inMemorySwitchOverCache are guaranteed non-nil by initialization.The
opts.SwitchoverConfigparameter passed tonewGraphServeris guaranteed to be non-nil with a validinMemorySwitchOverCachebecause:
- Before
newServer()callsnewGraphServer()at line 600, the Router'sStart()method initializesr.switchoverConfigwith a nil check at lines 1177-1179.NewSwitchoverConfig()always returns a SwitchoverConfig with an initializedinMemorySwitchOverCachefield (never nil).Accessing
opts.SwitchoverConfig.inMemorySwitchOverCache.enabledis safe; no defensive nil checks are needed.Likely an incorrect or invalid review comment.
1358-1362: Thread safety is already properly implemented; remove this concern.The
getPlanCacheForFFandsetPlanCacheForFFmethods properly usesync.RWMutex(RLock/RUnlock for reads, Lock/Unlock for writes with defers) to synchronize access to the shared cache map. The nil safety concern aboutopts.SwitchoverConfigis not an issue—line 1304 in the same function already accessesopts.SwitchoverConfig.inMemorySwitchOverCache.enabledwithout guards, meaning the parameter is guaranteed to be non-nil by design.Likely an incorrect or invalid review comment.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@router-tests/go.mod`:
- Line 207: The go.mod contains a replace directive pointing to the non-public
fork "github.com/wundergraph/ristretto/v2" which breaks module resolution;
remove that replace and switch to the upstream module
"github.com/dgraph-io/ristretto/v2" (or make the fork public if you must keep
it). Concretely, delete the replace entry that maps
"github.com/wundergraph/ristretto/v2 => github.com/wundergraph/ristretto/v2
v2.0.0-..." and ensure the project requires "github.com/dgraph-io/ristretto/v2"
(update to v2.1.0 if needed), then run "go mod tidy" to refresh the dependency
graph and verify builds/CI pass.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
router-tests/go.sumis excluded by!**/*.sum
📒 Files selected for processing (5)
router-tests/cache_warmup_test.gorouter-tests/go.modrouter/core/restart_switchover_config.gorouter/core/restart_switchover_config_test.gorouter/core/router.go
🧰 Additional context used
🧠 Learnings (10)
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.
Applied to files:
router-tests/go.modrouter-tests/cache_warmup_test.go
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.23+ minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.
Applied to files:
router-tests/go.mod
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.25 minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.
Applied to files:
router-tests/go.mod
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.
Applied to files:
router/core/restart_switchover_config_test.gorouter/core/restart_switchover_config.gorouter-tests/cache_warmup_test.gorouter/core/router.go
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.
Applied to files:
router/core/restart_switchover_config_test.gorouter-tests/cache_warmup_test.go
📚 Learning: 2026-01-06T12:37:21.521Z
Learnt from: asoorm
Repo: wundergraph/cosmo PR: 2379
File: router/pkg/connectrpc/operation_registry_test.go:381-399
Timestamp: 2026-01-06T12:37:21.521Z
Learning: In Go code (Go 1.25+), prefer using sync.WaitGroup.Go(func()) to run a function in a new goroutine, letting the WaitGroup manage Add/Done automatically. Avoid manual wg.Add(1) followed by go func() { defer wg.Done(); ... }() patterns. Apply this guidance across all Go files in the wundergraph/cosmo repository where concurrency is used.
Applied to files:
router/core/restart_switchover_config_test.gorouter/core/restart_switchover_config.gorouter-tests/cache_warmup_test.gorouter/core/router.go
📚 Learning: 2025-07-29T08:19:55.720Z
Learnt from: Noroth
Repo: wundergraph/cosmo PR: 2088
File: demo/pkg/subgraphs/projects/src/main_test.go:0-0
Timestamp: 2025-07-29T08:19:55.720Z
Learning: In Go testing, t.Fatal, t.FailNow, t.Skip* and similar methods cannot be called from goroutines other than the main test goroutine, as they will cause a runtime panic. Use channels, t.Error*, or other synchronization mechanisms to communicate errors from goroutines back to the main test thread.
Applied to files:
router-tests/cache_warmup_test.go
📚 Learning: 2025-08-28T09:18:10.121Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:100-108
Timestamp: 2025-08-28T09:18:10.121Z
Learning: In router-tests/http_subscriptions_test.go heartbeat tests, the message ordering should remain strict with data messages followed by heartbeat messages, as the timing is deterministic and known by design in the Cosmo router implementation.
Applied to files:
router-tests/cache_warmup_test.go
📚 Learning: 2025-08-26T17:19:28.178Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2167
File: router/internal/expr/retry_context.go:3-10
Timestamp: 2025-08-26T17:19:28.178Z
Learning: In the Cosmo router codebase, prefer using netErr.Timeout() checks over explicit context.DeadlineExceeded checks for timeout detection in retry logic, as Go's networking stack typically wraps context deadline exceeded errors in proper net.Error implementations.
Applied to files:
router-tests/cache_warmup_test.go
📚 Learning: 2025-09-29T10:28:07.122Z
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2192
File: router/pkg/config/config.go:1028-1029
Timestamp: 2025-09-29T10:28:07.122Z
Learning: The deprecation strategy for IntrospectionEnabled field in router/pkg/config/config.go is to first mark it as deprecated, then migrate all call sites to use IntrospectionConfig.Enabled, and finally remove the deprecated field. The envDefault tag should be kept during the deprecation period for backward compatibility.
Applied to files:
router/core/router.go
🧬 Code graph analysis (3)
router/core/restart_switchover_config_test.go (2)
router/core/restart_switchover_config.go (2)
InMemorySwitchOverCache(32-36)NewSwitchoverConfig(18-22)router/pkg/config/config.go (2)
CacheWarmupConfiguration(989-995)CacheWarmupInMemorySwitchover(983-985)
router-tests/cache_warmup_test.go (5)
router-tests/testenv/testenv.go (4)
Run(105-122)Config(292-349)RouterConfig(246-249)Environment(1735-1771)router/pkg/config/config.go (5)
Config(1037-1112)CacheWarmupConfiguration(989-995)CacheWarmupSource(974-977)CacheWarmupInMemorySwitchover(983-985)ExecutionConfig(865-869)router/core/router.go (3)
Option(174-174)WithCacheWarmupConfig(2114-2118)ExecutionConfig(159-163)router/core/cache_warmup.go (1)
CacheWarmupSource(26-28)router/pkg/controlplane/configpoller/config_poller.go (1)
ConfigPoller(20-29)
router/core/router.go (1)
router/core/restart_switchover_config.go (2)
SwitchoverConfig(14-16)NewSwitchoverConfig(18-22)
🔇 Additional comments (9)
router/core/restart_switchover_config_test.go (1)
1-337: Test coverage looks comprehensive.The test file provides thorough coverage of the
SwitchoverConfigandInMemorySwitchOverCachefunctionality, including:
- Enable/disable state transitions
- Edge cases (nil config, already enabled/disabled)
- Feature flag cache operations (get/set)
- Cleanup of unused feature flags
- Delegation from
SwitchoverConfigtoInMemorySwitchOverCacherouter/core/restart_switchover_config.go (1)
11-22: LGTM for structure and initialization.The type definitions and constructor are well-structured. The
planCachetype alias improves readability, andNewSwitchoverConfigcorrectly initializes with a disabled cache state.router/core/router.go (4)
93-93: LGTM for field addition.The
switchoverConfigfield is properly added to the Router struct to maintain state across config changes.
599-611: Clean integration of switchover config in server creation.The switchover config is correctly passed to
newGraphServerandCleanupFeatureFlagsis appropriately called after the server swap to prune stale feature flag caches.
1177-1181: LGTM for initialization logic.The lazy initialization of
switchoverConfigfor test scenarios is appropriate, and the comment explaining this is helpful (and the typo from the previous review has been fixed).
2057-2061: LGTM for new option function.The
WithSwitchoverConfigoption follows the established pattern for router configuration options.router-tests/cache_warmup_test.go (3)
1100-1106: Correct usage ofassertinsideEventuallyWithTcallback.Using
assert.Equalinstead ofrequire.Equalinside theEventuallyWithTcallback is correct—this allows the retry mechanism to work properly by collecting failures rather than immediately aborting.
1064-1108: No action needed. ThewriteTestConfighelper function is defined inrouter-tests/config_hot_reload_test.goat line 579 and is accessible from the cache_warmup_test.go file since both are in the same package. The function is properly defined and available for use.
919-971: Code is correct as-is—no undefined types.
ConfigPollerMockandwriteTestConfigare defined inconfig_hot_reload_test.goin the same package (integration), making them accessible throughoutcache_warmup_test.go. The test implementation is valid.Likely an incorrect or invalid review comment.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@router/core/restart_switchover_config.go`:
- Around line 120-136: The extractQueriesAndOverridePlanCache function currently
only handles values of type planCache and drops entries already converted to
[]*nodev1.Operation; update the loop in extractQueriesAndOverridePlanCache to
also detect and preserve already-converted entries (e.g., if v is of type
[]*nodev1.Operation or [](*nodev1.Operation) or the concrete slice type used) by
copying them directly into switchoverMap, and only call
convertToNodeOperation(cache) when v asserts to planCache; keep the existing
mutex logic and ensure the map type remains map[string]any to store both
converted slices and newly converted results.
🧹 Nitpick comments (1)
router/core/restart_switchover_config.go (1)
85-107: Potential performance concern: Long-running operation while holding read lock.
getPlanCacheForFFcallsconvertToNodeOperation(cache)at line 96 while holding theRLock. This function callsdata.Wait()(blocking until buffered writes are flushed) anddata.IterValues(...)(iterating over potentially many cache entries).If the plan cache has many entries or pending writes, this could hold the read lock for an extended period, blocking writers like
updateStateFromConfigorextractQueriesAndOverridePlanCache.Consider extracting the
planCachereference under the lock and performing the conversion outside:Proposed improvement
func (c *InMemorySwitchoverCache) getPlanCacheForFF(featureFlagKey string) []*nodev1.Operation { c.mu.RLock() - defer c.mu.RUnlock() - if c.queriesForFeatureFlag == nil { + c.mu.RUnlock() return nil } - switch cache := c.queriesForFeatureFlag[featureFlagKey].(type) { + data := c.queriesForFeatureFlag[featureFlagKey] + c.mu.RUnlock() + + switch cache := data.(type) { case planCache: return convertToNodeOperation(cache) case []*nodev1.Operation: return cache case nil: return nil default: c.logger.Error("unexpected type") return nil } }
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@router/core/restart_switchover_config.go`:
- Around line 159-168: The convertToNodeOperation function must guard against a
nil planCache and ensure pending asynchronous writes are flushed before
iteration: add a nil check for the data parameter (return nil slice if data is
nil) and call data.Wait() immediately before calling data.IterValues(); then
perform the existing IterValues loop to build []*nodev1.Operation. Refer to
convertToNodeOperation, the parameter named data (planCache), and the methods
Wait() and IterValues() when making the change.
🧹 Nitpick comments (2)
router/core/restart_switchover_config.go (2)
86-107: HoldingRLockduring ristretto cache iteration ingetPlanCacheForFFmay block writers.When the stored value is of type
planCache,convertToNodeOperationiterates the entire ristretto cache while theRLockis held (line 96). For large caches this could block all write-lock callers (setPlanCacheForFF,extractQueriesAndOverridePlanCache,updateStateFromConfig) for a non-trivial duration.Consider whether
getPlanCacheForFFis called on the hot request path. If so, extracting theplanCachereference underRLockand then callingconvertToNodeOperationoutside the lock would reduce contention:♻️ Proposed refactor
func (c *InMemoryPlanCacheFallback) getPlanCacheForFF(featureFlagKey string) []*nodev1.Operation { c.mu.RLock() - defer c.mu.RUnlock() - if c.queriesForFeatureFlag == nil { + c.mu.RUnlock() return nil } - - switch cache := c.queriesForFeatureFlag[featureFlagKey].(type) { + val := c.queriesForFeatureFlag[featureFlagKey] + c.mu.RUnlock() + + switch cache := val.(type) { case planCache: return convertToNodeOperation(cache) case []*nodev1.Operation:
120-136: Entries already converted to[]*nodev1.Operationare silently dropped.Lines 130-133 only copy
planCache-typed entries intoswitchoverMap. If any entry has already been converted to[]*nodev1.Operation(e.g., by a prior call), it will be lost. The prior review discussion concluded the lifecycle currently prevents double calls, but this is fragile — a defensive type switch costs nothing and prevents silent data loss if the call order ever changes.🛡️ Proposed defensive fix
switchoverMap := make(map[string]any) for k, v := range c.queriesForFeatureFlag { - if cache, ok := v.(planCache); ok { + switch cache := v.(type) { + case planCache: switchoverMap[k] = convertToNodeOperation(cache) + case []*nodev1.Operation: + switchoverMap[k] = cache } }
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
router/core/supervisor.go (1)
47-55:⚠️ Potential issue | 🟠 MajorEnsure
ReloadPersistentStateis non‑nil before wiring it into resources.
opts.ReloadPersistentStateis optional but lacks validation. WhenDefaultRouterFactoryis used (no custom factory provided), the router construction inbuildMuxwill dereferenceReloadPersistentState.inMemoryPlanCacheFallback, causing a panic if nil. Add a defensive nil-check or return an error early.🐛 Proposed fix
func NewRouterSupervisor(opts *RouterSupervisorOpts) (*RouterSupervisor, error) { + if opts.ReloadPersistentState == nil { + opts.ReloadPersistentState = NewReloadPersistentState(opts.BaseLogger) + } rs := &RouterSupervisor{ shutdownChan: make(chan bool), logger: opts.BaseLogger.With(zap.String("component", "supervisor")),
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@router/core/reload_persistent_state.go`:
- Around line 138-157: cleanupUnusedFeatureFlags in InMemoryPlanCacheFallback
returns early when routerCfg.FeatureFlagConfigs is nil, causing stale non-empty
entries in queriesForFeatureFlag to never be removed; change the logic so that
when routerCfg.FeatureFlagConfigs is nil you iterate and delete all keys in
queriesForFeatureFlag except the base "" (i.e., treat missing FeatureFlagConfigs
as empty and remove all non-"" entries), otherwise keep the existing check that
verifies existence in routerCfg.FeatureFlagConfigs.ConfigByFeatureFlagName
before deleting; ensure you perform this inside the same mu.Lock()/defer
mu.Unlock() block and reference the function cleanupUnusedFeatureFlags, the
receiver InMemoryPlanCacheFallback, and the map queriesForFeatureFlag in your
changes.
🧹 Nitpick comments (1)
router/core/reload_persistent_state.go (1)
85-107:RLockheld during potentially expensiveconvertToNodeOperationcall.When
getPlanCacheForFFhits theplanCachebranch (line 95–96), it callsconvertToNodeOperation— which callsWait()and iterates the entire ristretto cache — while holding theRLock. This blocks all writers (setPlanCacheForFF,extractQueries…,cleanupUnused…) for the duration of the iteration.Since this path is only exercised during cache warming at startup of a new router instance, the practical impact is low. However, you could avoid holding the lock during iteration by extracting the cache pointer under
RLock, releasing it, then callingconvertToNodeOperationoutside the lock (ristretto is internally thread-safe). Worth considering if plan caches grow large.
…ement-cache-warmer-from-existing-caches
Summary by CodeRabbit
New Features
Configuration
Tests
Chores
Description
This PR introduces a new mechanism to warm the cache on restarts due to schema changes (or whenever the router execution config changes). Config to enable this
If in_memory_switchover is not present or false, we default to the current cdn cache warmer (with cosmo), this preserves backwards compatibility.
This is achieved by first adding a new
switchoverConfigwhich can keep references across restarts, we use this to store a reference to the planner cache. When the router restarts due to an execution config change, we then loop through the elements in the ristretto planner cache.Right now ristretto does not support iterating, so we have opened a PR here dgraph-io/ristretto#475. For this PR (which we need to discuss before merging) we have pointed ristretto to the fork from where the PR was opened.
We create a new provider for the cache warmer, for
in_memory_switcherwhich extracts the queries from the planner cache upon startup. Note that with this method there is no cache warming upon startup, only on subsequent restarts due to execution config changes.We also support persisting across restarts on hot config reloading. However we ran into a problem of shutdown occuring first before the new graph mux was created (which is the opposite for schema change updates), which means that the current planner cache gets emptied before we can extract the queries from it. As a solution we preemptively extract the queries from the planner cache before its closed whenever a hot config reloading occurs.
This PR also changes how cache warmer is enabled when used with the cosmo CDN, previously it requested the manifest from the CDN if the cache warmer was enabled in the router
config.yaml, this meant that ex-cache warmer customers could still access old manifest files fron the cdn (as long as it was present), however now its only enabled if the feature is enabled in cosmo (upon startup we get if cache warmer is enabled from cosmo).Checklist