perf(l1): replace synchronous disk I/O with async operations in snap sync#6113
perf(l1): replace synchronous disk I/O with async operations in snap sync#6113pablodeymo wants to merge 9 commits intomainfrom
Conversation
Kimi AI Code ReviewCode Correctness and Potential Bugs
Security Vulnerabilities
Performance Implications
Rust Best Practices and Idiomatic Patterns
Memory Safety and Proper Error Handling
Code Readability and Maintainability
Ethereum-specific Considerations
ConclusionThe PR looks good overall. It correctly replaces synchronous disk I/O with async operations, improving the performance of the Ethereum client. The code is well-written, follows Rust best practices, and maintains a high level of security and correctness. One minor suggestion is to consider adding more comments or documentation for the Automated review by Kimi (Moonshot AI) |
Lines of code reportTotal lines added: Detailed view |
Greptile OverviewGreptile SummaryThis PR converts synchronous disk I/O operations to async operations in snap sync, preventing the tokio runtime from blocking during file system operations. The changes introduce a new Key Changes
Issues Found
Recommendations
Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| crates/networking/p2p/snap/async_fs.rs | New async filesystem wrapper module with comprehensive tests, but read_dir_paths silently ignores entry errors (regression from original code) |
| crates/networking/p2p/snap/client.rs | Cleanly replaced 4 instances of sync directory creation with async_fs::ensure_dir_exists, properly awaited |
| crates/networking/p2p/sync/snap_sync.rs | Systematically replaced all sync fs operations with async equivalents across both rocksdb and non-rocksdb code paths |
Sequence Diagram
sequenceDiagram
participant Runtime as Tokio Runtime
participant SnapSync as snap_sync.rs
participant Client as snap/client.rs
participant AsyncFS as async_fs.rs
participant Disk as File System
Note over SnapSync,Disk: Account Range Processing
Client->>AsyncFS: ensure_dir_exists(account_state_dir)
AsyncFS->>AsyncFS: tokio::fs::try_exists()
AsyncFS->>Disk: Check if directory exists
Disk-->>AsyncFS: Result
alt Directory doesn't exist
AsyncFS->>Disk: tokio::fs::create_dir_all()
Disk-->>AsyncFS: Created
end
AsyncFS-->>Client: Ok(())
Note over SnapSync,Disk: Account Insertion Phase
SnapSync->>AsyncFS: read_dir_paths(account_state_dir)
AsyncFS->>Runtime: spawn_blocking
Runtime->>Disk: std::fs::read_dir()
Disk-->>Runtime: Iterator<DirEntry>
Runtime->>Runtime: Collect and sort paths
Runtime-->>AsyncFS: Vec<PathBuf>
AsyncFS-->>SnapSync: Sorted file paths
loop For each snapshot file
SnapSync->>AsyncFS: read_file(path)
AsyncFS->>Disk: tokio::fs::read()
Disk-->>AsyncFS: File contents
AsyncFS-->>SnapSync: Vec<u8>
SnapSync->>SnapSync: Process and insert into trie
end
Note over SnapSync,Disk: Cleanup Phase
SnapSync->>AsyncFS: remove_dir_all(snapshots_dir)
AsyncFS->>Disk: tokio::fs::remove_dir_all()
Disk-->>AsyncFS: Removed
AsyncFS-->>SnapSync: Ok(())
Note over Runtime: Async I/O allows runtime to handle<br/>network operations concurrently
Kimi AI Code ReviewThis PR introduces asynchronous file system operations to improve the performance of the snap sync process by avoiding blocking the tokio runtime during disk I/O. Here's a detailed review focusing on the specified areas: Code Correctness and Potential Bugs
Security Vulnerabilities
Performance Implications
Rust Best Practices and Idiomatic Patterns
Memory Safety and Proper Error Handling
Code Readability and Maintainability
Ethereum-Specific Considerations
ConclusionThe PR looks good overall. It introduces significant performance improvements by replacing synchronous disk I/O with asynchronous operations in the snap sync process. The code is well-structured, follows Rust best practices, and includes comprehensive tests. The changes are consistent with Ethereum-specific considerations. One minor suggestion for improvement:
Overall, the PR is well-implemented and ready to be merged. Great work on improving the performance of the snap sync process! Automated review by Kimi (Moonshot AI) |
address documentation comments |
Kimi AI Code ReviewCode Correctness and Potential BugsThe PR introduces asynchronous file system operations to replace synchronous disk I/O in the snap sync process. This change is crucial for improving the performance of the Ethereum client by avoiding blocking operations on the Tokio runtime.
Security VulnerabilitiesThe changes do not introduce any new security vulnerabilities. The use of asynchronous I/O does not expose any additional attack surfaces, and the error handling is consistent with the existing codebase. Performance ImplicationsThe replacement of synchronous I/O with asynchronous operations is expected to have a positive impact on performance. By not blocking the Tokio runtime, the client can continue processing network operations while waiting for disk I/O, leading to better utilization of available resources and reduced latency. Rust Best Practices and Idiomatic PatternsThe PR follows Rust best practices and idiomatic patterns:
Memory Safety and Proper Error HandlingThe code is memory safe, and error handling is properly implemented:
Code Readability and MaintainabilityThe code is well-structured and readable:
Ethereum-Specific ConsiderationsThe changes do not affect Ethereum-specific logic such as EVM opcode correctness, gas accounting, consensus rules, or RLP encoding/decoding. The focus is on improving the performance of the snap sync process, which is orthogonal to these aspects. ConclusionThe PR is well-written, follows best practices, and is expected to improve the performance of the Ethereum client by replacing synchronous disk I/O with asynchronous operations. The changes are correct, secure, and maintain the idiomatic Rust style. The tests included in the PR cover the new functionality, ensuring that the changes are reliable. Suggested Changes:
Overall Impression: The PR looks good and is ready to be merged. The changes are well-justified and follow best practices for asynchronous programming in Rust. The tests included in the PR provide confidence in the correctness of the new functionality. Automated review by Kimi (Moonshot AI) |
|
Thanks for the thorough review! I've addressed the critical issue: Fixed in commit e1a25e9: The The fix changes from: .filter_map(|entry| entry.ok().map(|e| e.path()))To: .map(|entry| {
entry
.map(|e| e.path())
.map_err(|e| SnapError::FileSystem {
operation: "read directory entry",
path: dir.clone(),
kind: e.kind(),
})
})
.collect::<Result<Vec<_>, _>>()?;This ensures that any filesystem issues (permission denied, corruption, etc.) are properly surfaced rather than silently skipped. Regarding the documentation suggestion: the module already includes extensive docstrings explaining the hybrid approach ( |
Kimi AI Code ReviewThis PR introduces asynchronous file system operations to replace synchronous disk I/O in the snap sync process. This is a significant improvement for performance as it prevents blocking the Tokio runtime, which can lead to timeouts and reduced throughput. Here are my detailed comments: Code Correctness and Potential Bugs
Security Vulnerabilities
Performance Implications
Rust Best Practices and Idiomatic Patterns
Memory Safety and Proper Error Handling
Code Readability and Maintainability
Ethereum-Specific Considerations
ConclusionThe PR is well-written, follows Rust best practices, and introduces significant performance improvements by replacing synchronous disk I/O with asynchronous operations. The code is correct, secure, and well-documented. It adheres to Ethereum-specific considerations and maintains consensus rules and EIP compliance. Overall, I approve this PR. It is a valuable contribution to improving the performance of the snap sync process in EthRex. Automated review by Kimi (Moonshot AI) |
|
what do you think @greptile-apps? |
|
This is a well-architected solution to a real performance problem! You've identified the core issue - synchronous filesystem operations blocking the tokio runtime during snap sync - and addressed it systematically. Strong points:
Code quality observations:
Minor considerations:
Performance impact assessment: This should meaningfully improve snap sync performance by allowing network requests to proceed while disk I/O happens in parallel. The transformation from blocking patterns like: // Before: blocks entire runtime
for entry in std::fs::read_dir(dir)? { ... }to: // After: yields to other tasks
let files = async_fs::read_dir_paths(dir).await?;
for file_path in files { ... }...is exactly the right fix. The code simplification is also a nice bonus - reducing 42 lines to 4 lines while adding async behavior is impressive. Overall, this is a solid performance optimization with clean implementation. The approach is sound and the code quality is high. |
Kimi AI Code ReviewThis PR introduces asynchronous file system operations to replace synchronous disk I/O in the snap sync process, which is a significant improvement for performance and responsiveness in a networked blockchain client. Here's a detailed review: Code Correctness and Potential Bugs
Security Vulnerabilities
Performance Implications
Rust Best Practices and Idiomatic Patterns
Memory Safety and Proper Error Handling
Code Readability and Maintainability
Ethereum-Specific Considerations
ConclusionThe PR is well-implemented, follows best practices, and is likely to improve the performance of the Ethereum execution client. The changes are correctly scoped, and the new async file operations are implemented with proper error handling and testing. I approve this PR with no specific changes requested. Automated review by Kimi (Moonshot AI) |
Benchmark Block Execution Results Comparison Against Main
|
|
Thanks @ElFantasma! Fixed the documentation to clarify that |
Kimi AI Code ReviewThis PR aims to replace synchronous disk I/O operations with asynchronous ones in the snap sync module to prevent blocking the Tokio runtime. Here's a review based on the provided criteria: Code Correctness and Potential BugsThe changes appear to correctly replace synchronous file operations with their asynchronous counterparts using Suggestion:
Security VulnerabilitiesThe changes do not introduce any new security vulnerabilities. They adhere to the principle of least privilege by not introducing any new file permissions or access patterns. Performance ImplicationsThe performance implications are positive. By offloading disk I/O to non-blocking operations, the Tokio runtime can continue to process other tasks, such as network operations, which can lead to better overall performance and responsiveness. Rust Best Practices and Idiomatic PatternsThe use of Memory Safety and Proper Error HandlingThe code is memory-safe and correctly handles errors. It uses Code Readability and MaintainabilityThe code is well-structured and readable. The use of comments and clear function names makes it easy to understand the purpose of each section. The tests in Ethereum-Specific ConsiderationsThe changes do not affect EVM opcode correctness, gas accounting, consensus rules, EIP compliance, state trie operations, RLP encoding/decoding, or transaction and block validation logic. They are focused on improving the performance of the snap sync module. Overall Assessment: The PR is well-written and addresses the issue of synchronous disk I/O blocking the Tokio runtime effectively. It follows Rust best practices and maintains a high level of code quality. The changes are likely to improve the performance of the snap sync module without introducing new bugs or security vulnerabilities. Final Verdict: Approved with a suggestion for a minor optimization. Automated review by Kimi (Moonshot AI) |
|
I'm not sure what specific question you have - just seeing "?" Could you clarify what you'd like me to look at or comment on? If you're asking about the documentation update you just made (clarifying that tokio::fs uses spawn_blocking internally), that's a good clarification! The updated docs accurately explain that the benefit is the cleaner async API rather than a fundamentally different execution model. |
There was a problem hiding this comment.
Pull request overview
This PR aims to prevent Tokio runtime stalls during snap sync by replacing synchronous filesystem calls with async equivalents, centralizing common filesystem operations under a new snap::async_fs module.
Changes:
- Added
crates/networking/p2p/snap/async_fs.rswith async helpers for directory and file operations (including deterministicread_dir_paths). - Updated snap sync and snap client code paths to use async filesystem helpers instead of
std::fs::*. - Added a new snap protocol server integration test suite and updated dependencies/changelog.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/networking/p2p/snap/async_fs.rs | Introduces async FS wrappers (tokio::fs + spawn_blocking for read_dir) and module-level tests. |
| crates/networking/p2p/snap/mod.rs | Exports the new async_fs module. |
| crates/networking/p2p/snap/client.rs | Replaces snapshot directory creation logic with async_fs::ensure_dir_exists. |
| crates/networking/p2p/sync/snap_sync.rs | Switches snapshot read/list/remove operations to async equivalents via async_fs. |
| crates/networking/p2p/Cargo.toml | Adds tempfile dev-dependency for async_fs tests. |
| crates/networking/p2p/tests/snap_server_tests.rs | Adds a large integration test suite for snap server AccountRange behavior. |
| CHANGELOG.md | Adds a perf changelog entry for async disk I/O in snap sync. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| use std::str::FromStr; | ||
|
|
||
| use ethrex_common::{BigEndianHash, H256, types::AccountStateSlimCodec}; |
There was a problem hiding this comment.
BigEndianHash is imported but never used in this test module. This will at least generate an unused import warning; please remove it or use it if intended.
| use ethrex_common::{BigEndianHash, H256, types::AccountStateSlimCodec}; | |
| use ethrex_common::{H256, types::AccountStateSlimCodec}; |
There was a problem hiding this comment.
N/A — the test file was already merged to main via #5975 and is no longer part of this PR's diff.
| let snapshot_contents = async_fs::read_file(&file_path).await?; | ||
| let code_hashes: Vec<H256> = RLPDecode::decode(&snapshot_contents) | ||
| .map_err(|_| SyncError::CodeHashesSnapshotDecodeError(entry.path()))?; | ||
| .map_err(|_| SyncError::CodeHashesSnapshotDecodeError(file_path.clone()))?; |
There was a problem hiding this comment.
In this loop, file_path.clone() is only used to build the decode error, but file_path itself isn’t used afterwards. You can avoid the extra allocation by moving file_path into the error (and only borrowing it for the read).
| .map_err(|_| SyncError::CodeHashesSnapshotDecodeError(file_path.clone()))?; | |
| .map_err(|_| SyncError::CodeHashesSnapshotDecodeError(file_path))?; |
There was a problem hiding this comment.
Fixed in fe5f704 — file_path is moved into the error directly.
| //! Snap protocol server tests | ||
| //! | ||
| //! Hive `AccountRange` tests based on go-ethereum's test suite: | ||
| //! https://github.com/ethereum/go-ethereum/blob/3e567b8b2901611f004b5a6070a9b6d286be128d/cmd/devp2p/internal/ethtest/snap.go#L69 | ||
|
|
There was a problem hiding this comment.
This PR description focuses on async disk I/O changes, but it also adds a large new integration test suite (snap_server_tests.rs). Please update the PR description (or split into a separate PR) so the scope and motivation of these tests are clear.
There was a problem hiding this comment.
N/A — the test file was already merged to main via #5975 and is no longer part of this PR's diff.
| } | ||
|
|
||
| /// Removes a directory if it exists, ignoring NotFound errors. | ||
| pub async fn remove_dir_all_if_exists(path: &Path) -> Result<(), SnapError> { |
| /// Writes data to a file asynchronously. | ||
| /// | ||
| /// Creates parent directories if they don't exist. | ||
| pub async fn write_file(path: &Path, contents: &[u8]) -> Result<(), SnapError> { |
| //! | ||
| //! `std::fs::read_dir` returns a `ReadDir` iterator that yields `DirEntry` items. | ||
| //! While `tokio::fs::read_dir` exists, it returns an async stream that requires | ||
| //! careful handling of ownership and lifetimes. Using explicit `spawn_blocking` |
There was a problem hiding this comment.
IMO the complexity is overstated. The async version would look almost identical, and doesn't have any ownership/lifetime related annoyance.
I agree with the choice, but the sections describing why should be trimmed.
There was a problem hiding this comment.
Agreed. Trimmed the module docs from ~45 lines to 6 in fe5f704.
| /// } | ||
| /// ``` | ||
| pub async fn ensure_dir_exists(path: &Path) -> Result<(), SnapError> { | ||
| match tokio::fs::try_exists(path).await { |
There was a problem hiding this comment.
It should be explained why try_exists is called when create_dir_all is idempotent.
There was a problem hiding this comment.
Good catch — create_dir_all is idempotent and handles the exists case. Simplified to just tokio::fs::create_dir_all in fe5f704.
736f6b4 to
8b40f62
Compare
All comments addressed in fe5f704: removed unused functions, trimmed docs, simplified ensure_dir_exists.
Motivation
During snap sync, multiple file system operations block the tokio runtime:
std::fs::create_dir_all)std::fs::read)std::fs::read_dir)std::fs::remove_dir_all)These synchronous operations cause the async runtime to stall, preventing network operations from proceeding while disk I/O completes. This is particularly problematic during:
This PR addresses roadmap item 1.6 (Async Disk I/O) from the snap sync improvement plan.
Description
Approach: Hybrid Async Strategy
We use a hybrid approach for async file operations:
create_dir_alltokio::fsreadtokio::fsremove_dir_alltokio::fsread_dirspawn_blockingNew Module:
snap/async_fs.rsCreated a new module with async wrappers that provide consistent error handling:
All functions return
SnapError::FileSystemwith:std::io::ErrorKind)Changes to
snap/client.rsBefore (4 occurrences):
After:
Locations updated:
request_account_range()- account state snapshot directory (loop)request_account_range()- account state snapshot directory (final flush)request_storage_ranges()- storage snapshot directory (loop)request_storage_ranges()- storage snapshot directory (final flush)Changes to
sync/snap_sync.rsCode Hashes Processing
Before:
After:
Account Insertion (non-rocksdb)
Before:
After:
Storage Insertion (non-rocksdb)
Same pattern as account insertion - replaced
read_dir+read+remove_dir_allwith async equivalents.Account Insertion (rocksdb)
Before:
After:
Storage Insertion (rocksdb)
Same pattern - replaced sync fs operations with async equivalents.
Summary of Changes
snap/async_fs.rssnap/mod.rssnap/client.rssync/snap_sync.rsCargo.tomlCargo.lockNet change: +223 lines, -93 lines = +130 lines (including 189 lines of new module with tests)
How to Test
Unit Tests
Run the async_fs module tests:
cargo test -p ethrex-p2p async_fsExpected output:
Compilation Tests
Verify both feature configurations compile:
Integration Testing
Run a snap sync on testnet to verify:
# Example: Run snap sync on Holesky make start-holesky-metrics-dockerPerformance Verification
To verify async I/O is working:
tokio::time::sleeppolling loops respond promptlyFiles Changed
crates/networking/p2p/snap/async_fs.rscrates/networking/p2p/snap/mod.rscrates/networking/p2p/snap/client.rscrates/networking/p2p/sync/snap_sync.rscrates/networking/p2p/Cargo.tomlCargo.lockChecklist
std::fs::calls in snap sync code pathsRelated
refactor/snapsync-healing-unification