Legacy data result pruning and basic pruning#673
Open
hacheigriega wants to merge 8 commits into
Open
Conversation
03acea4 to
87b427e
Compare
38e5257 to
5c7e01c
Compare
87b427e to
02918d8
Compare
Closed
e0522c4 to
9404119
Compare
e86aa9b to
01e0eb4
Compare
hacheigriega
added a commit
that referenced
this pull request
Dec 9, 2025
Implementation of batching module pruning as described in #673.
01e0eb4 to
1ac4014
Compare
94e62b3 to
618d599
Compare
Thomasvdam
reviewed
Jan 12, 2026
Member
Thomasvdam
left a comment
There was a problem hiding this comment.
Overall I think this looks good. I'm not sure if we did a double check of the proposed strategies and numbers with Mario.
Prune batches and their associated data at every block based on two new module parameters NumBatchesToKeep and MaxBatchPrunePerBlock. For pruning data results and their batch assignment data, we resort to naive implementation because there is no mapping to data result objects from batch number or data result ID. In this implementation we go through `MaxDataResultsToCheckForPrune` items in the store starting from a random point and delete those whose associated batches have been pruned.
Implementation of batching module pruning as described in #673.
Simplify batch pruning logic by fixing numBatchesToKeep.
Before this commit, hasPruningCaughtUp was switched to true only when BatchPruneBatches has pruned all batched up to batchNumberAtUpgrade. This condition could be blocked for a long time by a big numBatchToKeep. So we add an alternative condition under which hasCaughtUp is switched to true: If all batches up to (currentBatchNum - numBatchesToKeep) have been pruned. BatchPruneBatches is updated accordingly. BasicPruneBatch now prunes a batch as long as it exists, whether it was created before or after the upgrade.
7157f62 to
8208652
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Batching pruning addresses following issues:
Issue 1 was implemented in #664. This PR addresses the other two issues.
Explanation of Changes
Pruning Parameters
Pruning logic is parameterized by the following module states, which will be added by the upgrade handler:
MaxBatchPrunePerBlockis a module parameter that specifies the maximum number of batches to prune per block in the Batch Pruning strategy. (Defaults to 100)MaxLegacyDataResultPrunePerBlockis a module parameter that specifies the maximum number of legacy data results to prune per block in the Legacy Data Result Pruning strategy. (Defaults to 1000)BatchNumberAtUpgradeis the batch number of the latest batch at the time of upgrade. The upgrade handler should set the value in order to record the batch number of the last batch containing legacy data results.HasPruningCaughtUpis set to true when either of the following conditions is met:currentBatchNum - NumBatchesToKeephave been pruned.It is set to false at the time of upgrade, initiating the Batch Pruning strategy. Once it is switched to true, the batching end blocker terminates the Batch Pruning strategy and initiates the Legacy Data Result Pruning strategy.
There is also a new global variable:
NumBatchesToKeepspecifies the number of batches to preserve in the state. Currently set to 10000.Pruning Strategies
Basic Pruning (Implemented in this PR)
When a new batch is created, the Basic Pruning strategy prunes one batch at
NewBatchNumber - NumBatchesToKeep, if this number is higher thanBatchNumberAtUpgrade.We also temporarily add Batch Pruning and Legacy Data Result Pruning strategies to prune the batches and their data that have been accumulated until the upgrade.
Batch Pruning (Implemented in #664)
This pruning strategy prunes up to MaxBatchPrunePerBlock batches and their associated data, except for data results and their batch assignments, every block either up to batch at BatchNumAtUpgrade or up to batch at currentBatchNum - NumBatchesToKeep. Once either of the two termination conditions is met, HasPruningCaughtUp is switched to true, terminating Batch Pruning and initiating Legacy Data Result Pruning.
Legacy Data Result Pruning (Implemented in this PR)
Once
HasPruningCaughtUpis switched to true, this pruning strategy begins. It iterates through legacy data result collection, pruning at mostMaxLegacyDataResultPrunePerBlockdata results and their batch assignments per block.Related PRs and Issues
Closes #671