Fix race condition in control loop that could cause pipeline starts to be ignored#892
Merged
Conversation
cmackenzie1
approved these changes
May 27, 2025
mwylde
added a commit
that referenced
this pull request
Jun 23, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR fix a race condition in the controller that could cause configuration updates to be missed.
The potential race worked like this: the run_to_completion loop would read the config at the end of an iteration. If a JobConfig update arrived (via StateMachine::update) after this read but before the execute_state call in the next iteration, the JobContext for that next iteration would be created with the stale config. The StateMachine::update method would send a JobMessage::ConfigUpdate, but the state machine might have already proceeded with an older config.
This is addressed by adding a new Applied or NotApplied status to each configuration read from the database, which allows us to track whether we've started a controller loop with a particular configuration or not, preventing us from missing a configuration update.