⚡️ Speed up function find_last_node by 17,985%#231
Closed
codeflash-ai[bot] wants to merge 1 commit intomainfrom
Closed
⚡️ Speed up function find_last_node by 17,985%#231codeflash-ai[bot] wants to merge 1 commit intomainfrom
find_last_node by 17,985%#231codeflash-ai[bot] wants to merge 1 commit intomainfrom
Conversation
The optimized code achieves a **180x speedup** by eliminating redundant work through preprocessing. **Key optimization:** The original code uses a nested loop structure where for each node, it checks all edges to verify the node isn't a source. This results in O(N × E) complexity. The optimized version precomputes a set of source node IDs in O(E) time, then checks each node against this set in O(1) time, reducing overall complexity to O(N + E). **What changed:** - Built a `sources` set containing all source node IDs from edges upfront - Replaced the nested `all(e["source"] != n["id"] for e in edges)` check with a simple `n["id"] not in sources` set membership test **Why this is faster:** 1. **Set lookup is O(1)** vs O(E) linear scan through all edges for each node 2. **Single pass through edges** instead of scanning edges repeatedly for every node 3. **Hash-based membership testing** (set) vs repeated equality comparisons **Performance characteristics from tests:** - **Small graphs** (2-10 nodes): 30-80% faster - modest gains due to setup overhead - **Medium graphs** (100-1000 nodes): 86-1465% faster - significant speedup as edge scanning cost dominates - **Large linear chains** (1000 nodes): **32,000%+ faster** - the original's quadratic behavior becomes catastrophic - **Dense graphs** (fully connected): **8,600% faster** - maximum benefit where edge count is highest - **Empty/minimal cases**: Slightly slower (13-30%) due to set creation overhead, but negligible in absolute terms (nanoseconds) The optimization excels when the graph has many edges or nodes, which is typical in real-world graph processing scenarios. The preprocessing cost is amortized extremely well across all but the most trivial graphs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📄 17,985% (179.85x) speedup for
find_last_nodeinsrc/algorithms/graph.py⏱️ Runtime :
91.0 milliseconds→503 microseconds(best of250runs)📝 Explanation and details
The optimized code achieves a 180x speedup by eliminating redundant work through preprocessing.
Key optimization: The original code uses a nested loop structure where for each node, it checks all edges to verify the node isn't a source. This results in O(N × E) complexity. The optimized version precomputes a set of source node IDs in O(E) time, then checks each node against this set in O(1) time, reducing overall complexity to O(N + E).
What changed:
sourcesset containing all source node IDs from edges upfrontall(e["source"] != n["id"] for e in edges)check with a simplen["id"] not in sourcesset membership testWhy this is faster:
Performance characteristics from tests:
The optimization excels when the graph has many edges or nodes, which is typical in real-world graph processing scenarios. The preprocessing cost is amortized extremely well across all but the most trivial graphs.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-find_last_node-mjngkx4pand push.