Skip to content

ci(core): benchmark for batch block import #2210

Closed
MarcosNicolau wants to merge 2 commits intomainfrom
feat/blocks-batching-ci
Closed

ci(core): benchmark for batch block import #2210
MarcosNicolau wants to merge 2 commits intomainfrom
feat/blocks-batching-ci

Conversation

@MarcosNicolau
Copy link
Copy Markdown
Contributor

@MarcosNicolau MarcosNicolau commented Mar 12, 2025

Motivation
Benchmark batch block execution.

Description
In #2174 we introduce batch block import, this pr:

Note: this ci will work when #2174 is merged into main.

Closes None

@MarcosNicolau MarcosNicolau requested a review from a team as a code owner March 12, 2025 21:41
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 12, 2025

Lines of code report

Total lines added: 171
Total lines removed: 0
Total lines changed: 171

Detailed view
+---------------------------------------------+-------+------+
| File                                        | Lines | Diff |
+---------------------------------------------+-------+------+
| ethrex/crates/blockchain/blockchain.rs      | 524   | +29  |
+---------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync.rs        | 609   | +50  |
+---------------------------------------------+-------+------+
| ethrex/crates/storage/store.rs              | 1186  | +49  |
+---------------------------------------------+-------+------+
| ethrex/crates/storage/store_db/in_memory.rs | 534   | +13  |
+---------------------------------------------+-------+------+
| ethrex/crates/storage/store_db/redb.rs      | 976   | +19  |
+---------------------------------------------+-------+------+
| ethrex/crates/vm/backends/mod.rs            | 357   | +11  |
+---------------------------------------------+-------+------+

@MarcosNicolau MarcosNicolau force-pushed the feat/blocks-batching-ci branch from fc0a960 to e439e62 Compare March 12, 2025 21:42
@MarcosNicolau MarcosNicolau reopened this Mar 12, 2025
@MarcosNicolau MarcosNicolau force-pushed the feat/store-state-trie-n-blocks branch from 66f40d7 to e439e62 Compare March 12, 2025 21:50
@MarcosNicolau MarcosNicolau added the performance Block execution throughput and performance in general label Mar 12, 2025
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 12, 2025

Benchmark Block Execution Results Comparison Against Main

Command Mean [s] Min [s] Max [s] Relative
base 230.359 ± 1.237 228.826 232.736 1.00
head 230.840 ± 1.876 228.872 234.496 1.00 ± 0.01

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 13, 2025

Benchmark Block Batch Execution Results Comparison Against Main

Command Mean [s] Min [s] Max [s] Relative
base 223.367 ± 1.628 220.529 225.752 1.00
head 224.807 ± 1.745 222.348 228.203 1.01 ± 0.01

Copy link
Copy Markdown
Collaborator

@mpaulucci mpaulucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! 🚀

hyperfine --setup "./bin/ethrex-base removedb" -w 5 -N -r 10 --show-output --export-markdown "bench_pr_comparison.md" \
-L bin "$BINS" -n "{bin}" \
"./bin/ethrex-{bin} --network test_data/genesis-l2-ci.json import ./test_data/l2-1k-erc20.rlp --removedb"
echo -e "## Benchmark Block Batch Execution Results Comparison Against Main\n\n$(cat bench_pr_comparison.md)" > bench_pr_comparison.md
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe we could move the "Benchmark Block Batch Execution Results Comparison Against Main" and "bench_pr_comparison.md" to variables/constants so that changing in one place doesn't break the workflow

@mpaulucci
Copy link
Copy Markdown
Collaborator

You should remove https://github.com/lambdaclass/ethrex/blob/main/.github/workflows/ci_bench_block_execution.yaml and add yours. Comparing against base is more accurate.

Copy link
Copy Markdown
Collaborator

@mpaulucci mpaulucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure we want to have a separate job just for this usecase but let's keep this open anyway.

@mpaulucci mpaulucci changed the base branch from feat/store-state-trie-n-blocks to feat/store-state-trie-n-blocks-old March 21, 2025 18:15
@mpaulucci mpaulucci marked this pull request as draft March 24, 2025 13:46
github-merge-queue bot pushed a commit that referenced this pull request Mar 25, 2025
**Motivation**
Accelerate syncing!

**Description**
This PR introduces block batching during full sync:
1. Instead of storing and computing the state root for each block
individually, we now maintain a single state tree for the entire batch,
committing it only at the end. This results in one state trie per `n`
blocks instead of one per block (we'll need less storage also).
2. The new full sync process:
    - Request 1024 headers
    - Request 1024 block bodies and collect them
- Once all blocks are received, process them in batches using a single
state trie, which is attached to the last block.
3. Blocks are now stored in a single transaction.
4. State root, receipts root, and request root validation are only
required for the last block in the batch.
5. The new add_blocks_in_batch function includes a flag,
`should_commit_intermediate_tries`. When set to true, it stores the
tries for each block. This functionality is added to make the hive test
pass. Currently, this is handled by verifying if the block is within the
`STATE_TRIES_TO_KEEP` range. In a real syncing scenario, my intuition is
that it would be better to wait until we are fully synced and then we
would start storing the state of the new blocks and pruning when we
reach `STATE_TRIES_TO_KEEP`.
6. Throughput when syncing is now measured per batches.
7. A new command was added to import blocks in batch

Considerations:
1. ~Optimize account updates: Instead of inserting updates into the
state trie after each block execution, batch them at the end, merging
repeated accounts to reduce insertions and improve performance (see
#2216)~ Closes #2216.
2. Improve transaction handling: Avoid committing storage tries to the
database separately. Instead, create a single transaction for storing
receipts, storage tries, and blocks. This would require additional
abstractions for transaction management (see #2217).
3. This isn't working for `levm` backend we need it to cache the
executions state and persist it between them, as we don't store anything
until the final of the batch (see #2218).
4. In #2210 a new ci is added to run a bench comparing main and `head`
branch using `import-in-batch`

Closes None

---------

Co-authored-by: Martin Paulucci <martin.c.paulucci@gmail.com>
Co-authored-by: fmoletta <99273364+fmoletta@users.noreply.github.com>
@MarcosNicolau MarcosNicolau changed the base branch from feat/store-state-trie-n-blocks-old to main March 26, 2025 19:08
@MarcosNicolau MarcosNicolau force-pushed the feat/blocks-batching-ci branch from 962c790 to a09a2f1 Compare March 26, 2025 19:13
@jrchatruc
Copy link
Copy Markdown
Collaborator

Closed since the import_in_batch feature was not merged to main in the end

@jrchatruc jrchatruc closed this Mar 28, 2025
@jrchatruc jrchatruc deleted the feat/blocks-batching-ci branch March 28, 2025 19:17
pedrobergamini pushed a commit to pedrobergamini/ethrex that referenced this pull request Aug 24, 2025
…aclass#2174)

**Motivation**
Accelerate syncing!

**Description**
This PR introduces block batching during full sync:
1. Instead of storing and computing the state root for each block
individually, we now maintain a single state tree for the entire batch,
committing it only at the end. This results in one state trie per `n`
blocks instead of one per block (we'll need less storage also).
2. The new full sync process:
    - Request 1024 headers
    - Request 1024 block bodies and collect them
- Once all blocks are received, process them in batches using a single
state trie, which is attached to the last block.
3. Blocks are now stored in a single transaction.
4. State root, receipts root, and request root validation are only
required for the last block in the batch.
5. The new add_blocks_in_batch function includes a flag,
`should_commit_intermediate_tries`. When set to true, it stores the
tries for each block. This functionality is added to make the hive test
pass. Currently, this is handled by verifying if the block is within the
`STATE_TRIES_TO_KEEP` range. In a real syncing scenario, my intuition is
that it would be better to wait until we are fully synced and then we
would start storing the state of the new blocks and pruning when we
reach `STATE_TRIES_TO_KEEP`.
6. Throughput when syncing is now measured per batches.
7. A new command was added to import blocks in batch

Considerations:
1. ~Optimize account updates: Instead of inserting updates into the
state trie after each block execution, batch them at the end, merging
repeated accounts to reduce insertions and improve performance (see
lambdaclass#2216)~ Closes lambdaclass#2216.
2. Improve transaction handling: Avoid committing storage tries to the
database separately. Instead, create a single transaction for storing
receipts, storage tries, and blocks. This would require additional
abstractions for transaction management (see lambdaclass#2217).
3. This isn't working for `levm` backend we need it to cache the
executions state and persist it between them, as we don't store anything
until the final of the batch (see lambdaclass#2218).
4. In lambdaclass#2210 a new ci is added to run a bench comparing main and `head`
branch using `import-in-batch`

Closes None

---------

Co-authored-by: Martin Paulucci <martin.c.paulucci@gmail.com>
Co-authored-by: fmoletta <99273364+fmoletta@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Block execution throughput and performance in general

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants