feat(l1): process blocks in batches when syncing and importing by MarcosNicolau · Pull Request #2174 · lambdaclass/ethrex

MarcosNicolau · 2025-03-07T12:13:18Z

Motivation
Accelerate syncing!

Description
This PR introduces block batching during full sync:

Instead of storing and computing the state root for each block individually, we now maintain a single state tree for the entire batch, committing it only at the end. This results in one state trie per n blocks instead of one per block (we'll need less storage also).
The new full sync process:
- Request 1024 headers
- Request 1024 block bodies and collect them
- Once all blocks are received, process them in batches using a single state trie, which is attached to the last block.
Blocks are now stored in a single transaction.
State root, receipts root, and request root validation are only required for the last block in the batch.
The new add_blocks_in_batch function includes a flag, should_commit_intermediate_tries. When set to true, it stores the tries for each block. This functionality is added to make the hive test pass. Currently, this is handled by verifying if the block is within the STATE_TRIES_TO_KEEP range. In a real syncing scenario, my intuition is that it would be better to wait until we are fully synced and then we would start storing the state of the new blocks and pruning when we reach STATE_TRIES_TO_KEEP.
Throughput when syncing is now measured per batches.
A new command was added to import blocks in batch

Considerations:

Optimize account updates: Instead of inserting updates into the state trie after each block execution, batch them at the end, merging repeated accounts to reduce insertions and improve performance (see Optimize account updates in add_blocks_in_batch #2216) Closes Optimize account updates in add_blocks_in_batch #2216.
Improve transaction handling: Avoid committing storage tries to the database separately. Instead, create a single transaction for storing receipts, storage tries, and blocks. This would require additional abstractions for transaction management (see Write batch of blocks in a single transaction in add_blocks_in_batch #2217).
This isn't working for levm backend we need it to cache the executions state and persist it between them, as we don't store anything until the final of the batch (see Make add_blocks_in_batch work for LEVM #2218).
In ci(core): benchmark for batch block import #2210 a new ci is added to run a bench comparing main and head branch using import-in-batch

Closes None

github-actions · 2025-03-07T12:14:15Z

Lines of code report

Total lines added: 361
Total lines removed: 0
Total lines changed: 361

Detailed view

+---------------------------------------------+-------+------+
| File                                        | Lines | Diff |
+---------------------------------------------+-------+------+
| ethrex/crates/blockchain/blockchain.rs      | 495   | +109 |
+---------------------------------------------+-------+------+
| ethrex/crates/common/trie/trie.rs           | 812   | +4   |
+---------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync.rs        | 551   | +67  |
+---------------------------------------------+-------+------+
| ethrex/crates/storage/api.rs                | 197   | +6   |
+---------------------------------------------+-------+------+
| ethrex/crates/storage/store.rs              | 1137  | +6   |
+---------------------------------------------+-------+------+
| ethrex/crates/storage/store_db/in_memory.rs | 521   | +35  |
+---------------------------------------------+-------+------+
| ethrex/crates/storage/store_db/libmdbx.rs   | 1163  | +52  |
+---------------------------------------------+-------+------+
| ethrex/crates/storage/store_db/redb.rs      | 957   | +71  |
+---------------------------------------------+-------+------+
| ethrex/crates/vm/backends/mod.rs            | 321   | +11  |
+---------------------------------------------+-------+------+

crates/blockchain/blockchain.rs

mpaulucci · 2025-03-07T16:11:24Z

crates/blockchain/blockchain.rs

+            // todo only execute transactions
+            // batch account updates to merge the repeated accounts
+            self.storage
+                .apply_account_updates_to_trie(&account_updates, &mut state_trie)?;


Ahh, I understand now. I was hoping we could just call: https://github.com/lambdaclass/lambda_ethereum_rust/blob/0acc5e28b861f88c30cebb6cbfe0230970df25ed/crates/vm/backends/revm.rs#L96 get_state_transitions only once.

We would need to add a execute_blocks inside vm.

we can discuss this later.

I've tried this approach but it won't work without making larger modifications to the vm backend.

we were missing to set the cannonical block hash for the number

… cloning blocks

crates/blockchain/blockchain.rs

crates/networking/p2p/sync.rs

crates/storage/store_db/in_memory.rs

…to feat/store-state-trie-n-blocks

crates/blockchain/blockchain.rs

crates/common/trie/trie.rs

crates/networking/p2p/sync.rs

…implify code

crates/networking/p2p/sync.rs

crates/networking/p2p/peer_handler.rs

Co-authored-by: fmoletta <99273364+fmoletta@users.noreply.github.com>

…aclass#2174) **Motivation** Accelerate syncing! **Description** This PR introduces block batching during full sync: 1. Instead of storing and computing the state root for each block individually, we now maintain a single state tree for the entire batch, committing it only at the end. This results in one state trie per `n` blocks instead of one per block (we'll need less storage also). 2. The new full sync process: - Request 1024 headers - Request 1024 block bodies and collect them - Once all blocks are received, process them in batches using a single state trie, which is attached to the last block. 3. Blocks are now stored in a single transaction. 4. State root, receipts root, and request root validation are only required for the last block in the batch. 5. The new add_blocks_in_batch function includes a flag, `should_commit_intermediate_tries`. When set to true, it stores the tries for each block. This functionality is added to make the hive test pass. Currently, this is handled by verifying if the block is within the `STATE_TRIES_TO_KEEP` range. In a real syncing scenario, my intuition is that it would be better to wait until we are fully synced and then we would start storing the state of the new blocks and pruning when we reach `STATE_TRIES_TO_KEEP`. 6. Throughput when syncing is now measured per batches. 7. A new command was added to import blocks in batch Considerations: 1. ~Optimize account updates: Instead of inserting updates into the state trie after each block execution, batch them at the end, merging repeated accounts to reduce insertions and improve performance (see lambdaclass#2216)~ Closes lambdaclass#2216. 2. Improve transaction handling: Avoid committing storage tries to the database separately. Instead, create a single transaction for storing receipts, storage tries, and blocks. This would require additional abstractions for transaction management (see lambdaclass#2217). 3. This isn't working for `levm` backend we need it to cache the executions state and persist it between them, as we don't store anything until the final of the batch (see lambdaclass#2218). 4. In lambdaclass#2210 a new ci is added to run a bench comparing main and `head` branch using `import-in-batch` Closes None --------- Co-authored-by: Martin Paulucci <martin.c.paulucci@gmail.com> Co-authored-by: fmoletta <99273364+fmoletta@users.noreply.github.com>

MarcosNicolau added 4 commits March 6, 2025 17:58

feat: add_blocks_in_batch

3653b85

feat: add_blocks_in_batch in full sync

6b8161d

feat: store batch of receipts in libmdbx and in-memory

1220097

feat: store batch of blocks in single tx for libmdbx and in-memory

f78db0d

MarcosNicolau requested a review from a team as a code owner March 7, 2025 12:13

mpaulucci reviewed Mar 7, 2025

View reviewed changes

crates/blockchain/blockchain.rs Show resolved Hide resolved

mpaulucci reviewed Mar 7, 2025

View reviewed changes

crates/blockchain/blockchain.rs Outdated Show resolved Hide resolved

MarcosNicolau added 2 commits March 7, 2025 10:37

Merge branch 'main' into feat/store-state-trie-n-blocks

565eb60

fix: merge with main

0acc5e2

mpaulucci reviewed Mar 7, 2025

View reviewed changes

crates/blockchain/blockchain.rs Outdated Show resolved Hide resolved

mpaulucci reviewed Mar 7, 2025

View reviewed changes

crates/blockchain/blockchain.rs Outdated Show resolved Hide resolved

mpaulucci reviewed Mar 7, 2025

View reviewed changes

mpaulucci and others added 6 commits March 7, 2025 20:43

Merge branch 'main' into feat/store-state-trie-n-blocks

7a4bcea

fix: parent hash check and block execution state

80384ad

fix: withdrawal test regression

3746d40

we were missing to set the cannonical block hash for the number

Merge branch 'main' into feat/store-state-trie-n-blocks

88a8d04

feat: use blocks_in_batch in ethrex cmd import chain.rlp

5d5a97c

fix: import blocks in batch

56114b7

MarcosNicolau added the performance Block execution throughput and performance in general label Mar 10, 2025

MarcosNicolau and others added 3 commits March 10, 2025 17:13

refactor: add_block call add_blocks_in_batch, take ownership to avoid…

e1a86ed

… cloning blocks

fix: rpc-compat hive tests

44c820a

fix: add max tries to store in db

23bdd74

fmoletta reviewed Mar 11, 2025

View reviewed changes

crates/blockchain/blockchain.rs Outdated Show resolved Hide resolved

fmoletta reviewed Mar 11, 2025

View reviewed changes

crates/blockchain/blockchain.rs Outdated Show resolved Hide resolved

MarcosNicolau marked this pull request as draft March 11, 2025 13:23

fmoletta reviewed Mar 11, 2025

View reviewed changes

crates/blockchain/blockchain.rs Outdated Show resolved Hide resolved

Merge branch 'main' into feat/store-state-trie-n-blocks

7fc9414

fmoletta reviewed Mar 11, 2025

View reviewed changes

crates/networking/p2p/sync.rs Outdated Show resolved Hide resolved

fix: merge

bc9ab4b

mpaulucci reviewed Mar 21, 2025

View reviewed changes

crates/storage/store_db/in_memory.rs Outdated Show resolved Hide resolved

mpaulucci added 5 commits March 21, 2025 19:07

Merge branch 'main' of github.com:lambdaclass/lambda_ethereum_rust in…

33929bc

…to feat/store-state-trie-n-blocks

Remove apply_account_updates_to_trie.

9e3f59b

Simplify add_block

f7a6018

Update changelog date.

2139269

Remove block importing code.

1b87f6c

mpaulucci approved these changes Mar 21, 2025

View reviewed changes

fmoletta reviewed Mar 21, 2025

View reviewed changes

crates/blockchain/blockchain.rs Outdated Show resolved Hide resolved

fmoletta reviewed Mar 21, 2025

View reviewed changes

crates/common/trie/trie.rs Show resolved Hide resolved

fmoletta reviewed Mar 21, 2025

View reviewed changes

crates/networking/p2p/sync.rs Outdated Show resolved Hide resolved

fmoletta reviewed Mar 21, 2025

View reviewed changes

crates/networking/p2p/sync.rs Outdated Show resolved Hide resolved

fmoletta reviewed Mar 21, 2025

View reviewed changes

crates/networking/p2p/sync.rs Outdated Show resolved Hide resolved

fmoletta reviewed Mar 21, 2025

View reviewed changes

crates/networking/p2p/sync.rs Outdated Show resolved Hide resolved

fmoletta reviewed Mar 21, 2025

View reviewed changes

crates/networking/p2p/sync.rs Outdated Show resolved Hide resolved

MarcosNicolau and others added 3 commits March 25, 2025 13:29

Merge branch 'main' into feat/store-state-trie-n-blocks

004ba50

refactor: full sync move shared variables to SyncManager struct and s…

b7b0646

…implify code

Merge branch 'main' into feat/store-state-trie-n-blocks

432cc10

fmoletta reviewed Mar 25, 2025

View reviewed changes

crates/networking/p2p/sync.rs Outdated Show resolved Hide resolved

fmoletta reviewed Mar 25, 2025

View reviewed changes

crates/networking/p2p/peer_handler.rs Outdated Show resolved Hide resolved

MarcosNicolau and others added 3 commits March 25, 2025 16:10

revert: moving heads into SyncManager struct

89413d8

Update crates/networking/p2p/peer_handler.rs

cd705a0

Co-authored-by: fmoletta <99273364+fmoletta@users.noreply.github.com>

chore: derive debug for BatchBlockProcessingFailure

c986f4e

fmoletta approved these changes Mar 25, 2025

View reviewed changes

fix: lint

6c037c7

MarcosNicolau force-pushed the feat/store-state-trie-n-blocks branch from 19e3593 to 6c037c7 Compare March 25, 2025 20:15

MarcosNicolau added this pull request to the merge queue Mar 25, 2025

Merged via the queue into main with commit cdbfbe9 Mar 25, 2025
25 of 26 checks passed

MarcosNicolau deleted the feat/store-state-trie-n-blocks branch March 25, 2025 21:59

Conversation

MarcosNicolau commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Lines of code report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mpaulucci Mar 7, 2025

Choose a reason for hiding this comment

Uh oh!

mpaulucci Mar 7, 2025

Choose a reason for hiding this comment

Uh oh!

MarcosNicolau Mar 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MarcosNicolau commented Mar 7, 2025 •

edited

Loading

github-actions bot commented Mar 7, 2025 •

edited

Loading