Skip to content

Legacy data result pruning and basic pruning#673

Open
hacheigriega wants to merge 8 commits into
mainfrom
hy/future-prune
Open

Legacy data result pruning and basic pruning#673
hacheigriega wants to merge 8 commits into
mainfrom
hy/future-prune

Conversation

@hacheigriega
Copy link
Copy Markdown
Member

@hacheigriega hacheigriega commented Nov 28, 2025

Motivation

Batching pruning addresses following issues:

  1. We need to prune the accumulated batches and their associated data in batches.
  2. Since legacy data results cannot be retrieved based on a batch number, they have to be pruned separately.
  3. We need a permanent, efficient solution for batch pruning.

Issue 1 was implemented in #664. This PR addresses the other two issues.

Explanation of Changes

Pruning Parameters

Pruning logic is parameterized by the following module states, which will be added by the upgrade handler:

  • MaxBatchPrunePerBlock is a module parameter that specifies the maximum number of batches to prune per block in the Batch Pruning strategy. (Defaults to 100)

  • MaxLegacyDataResultPrunePerBlock is a module parameter that specifies the maximum number of legacy data results to prune per block in the Legacy Data Result Pruning strategy. (Defaults to 1000)

  • BatchNumberAtUpgrade is the batch number of the latest batch at the time of upgrade. The upgrade handler should set the value in order to record the batch number of the last batch containing legacy data results.

  • HasPruningCaughtUp is set to true when either of the following conditions is met:

    • (i) All batches up to batchNumberAtUpgrade have been pruned.
    • (ii) All batches up to currentBatchNum - NumBatchesToKeep have been pruned.

    It is set to false at the time of upgrade, initiating the Batch Pruning strategy. Once it is switched to true, the batching end blocker terminates the Batch Pruning strategy and initiates the Legacy Data Result Pruning strategy.

There is also a new global variable:

  • NumBatchesToKeep specifies the number of batches to preserve in the state. Currently set to 10000.
    • This value is not kept as a module state because it is a fixed value that originates from outside the module logic. In a future upgrade we may turn this into a module parameter, in which case the pruning logic would need to be updated to accommodate its variability.

Pruning Strategies

Basic Pruning (Implemented in this PR)
When a new batch is created, the Basic Pruning strategy prunes one batch at NewBatchNumber - NumBatchesToKeep, if this number is higher than BatchNumberAtUpgrade.

We also temporarily add Batch Pruning and Legacy Data Result Pruning strategies to prune the batches and their data that have been accumulated until the upgrade.

Batch Pruning (Implemented in #664)
This pruning strategy prunes up to MaxBatchPrunePerBlock batches and their associated data, except for data results and their batch assignments, every block either up to batch at BatchNumAtUpgrade or up to batch at currentBatchNum - NumBatchesToKeep. Once either of the two termination conditions is met, HasPruningCaughtUp is switched to true, terminating Batch Pruning and initiating Legacy Data Result Pruning.

Legacy Data Result Pruning (Implemented in this PR)
Once HasPruningCaughtUp is switched to true, this pruning strategy begins. It iterates through legacy data result collection, pruning at most MaxLegacyDataResultPrunePerBlock data results and their batch assignments per block.

Related PRs and Issues

Closes #671

@hacheigriega hacheigriega force-pushed the hy/data-result-prune branch 2 times, most recently from 03acea4 to 87b427e Compare November 28, 2025 18:41
@hacheigriega hacheigriega marked this pull request as draft December 1, 2025 17:29
@hacheigriega hacheigriega force-pushed the hy/future-prune branch 3 times, most recently from e0522c4 to 9404119 Compare December 9, 2025 02:04
@hacheigriega hacheigriega changed the base branch from hy/data-result-prune to main December 9, 2025 02:04
@hacheigriega hacheigriega marked this pull request as ready for review December 9, 2025 02:04
@hacheigriega hacheigriega changed the title Permanent x/batching pruning solution Full implementation of x/batching pruning Dec 9, 2025
@hacheigriega hacheigriega force-pushed the hy/future-prune branch 3 times, most recently from e86aa9b to 01e0eb4 Compare December 9, 2025 14:30
@hacheigriega hacheigriega changed the title Full implementation of x/batching pruning Legacy data result pruning and simple pruning Dec 9, 2025
hacheigriega added a commit that referenced this pull request Dec 9, 2025
Implementation of batching module pruning as described in #673.
@hacheigriega hacheigriega changed the title Legacy data result pruning and simple pruning Legacy data result pruning and basic pruning Jan 5, 2026
@hacheigriega hacheigriega force-pushed the hy/future-prune branch 3 times, most recently from 94e62b3 to 618d599 Compare January 8, 2026 13:01
@hacheigriega hacheigriega requested a review from a team January 8, 2026 13:05
Copy link
Copy Markdown
Member

@Thomasvdam Thomasvdam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this looks good. I'm not sure if we did a double check of the proposed strategies and numbers with Mario.

Comment thread x/batching/keeper/batch_assignments.go
Comment thread x/batching/keeper/pruning.go Outdated
Prune batches and their associated data at every block based on two new
module parameters NumBatchesToKeep and MaxBatchPrunePerBlock.
For pruning data results and their batch assignment data, we resort to
naive implementation because there is no mapping to data result objects
from batch number or data result ID. In this implementation we go through
`MaxDataResultsToCheckForPrune` items in the store starting from a random
point and delete those whose associated batches have been pruned.
Implementation of batching module pruning as described in #673.
Simplify batch pruning logic by fixing numBatchesToKeep.
Before this commit, hasPruningCaughtUp was switched to true only when
BatchPruneBatches has pruned all batched up to batchNumberAtUpgrade.
This condition could be blocked for a long time by a big numBatchToKeep.
So we add an alternative condition under which hasCaughtUp is switched to
true: If all batches up to (currentBatchNum - numBatchesToKeep) have been
pruned.
BatchPruneBatches is updated accordingly. BasicPruneBatch now prunes a
batch as long as it exists, whether it was created before or after the
upgrade.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

♻️ Improve x/batching pruning

2 participants