⚡️ Speed up function `find_last_node` by 13,482% by codeflash-ai[bot] · Pull Request #233 · codeflash-ai/optimize-me

codeflash-ai · 2025-12-26T23:12:11Z

📄 13,482% (134.82x) speedup for `find_last_node` in `src/algorithms/graph.py`

⏱️ Runtime : 50.4 milliseconds → 371 microseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 135x speedup by eliminating a nested loop that was causing O(N×E) time complexity.

What changed:
The original code checked all(e["source"] != n["id"] for e in edges) for every node, meaning for each of N nodes, it scanned all E edges. The optimized version preprocesses edges once into a set edge_sources = {e["source"] for e in edges}, then performs constant-time lookups n["id"] not in edge_sources.

Why this is faster:

Set lookups are O(1) vs. list scans which are O(E)
The algorithm complexity drops from O(N×E) to O(N+E)
In Python, the all() generator with nested iteration is particularly expensive because it repeats the same edge traversal for every node

Performance characteristics from tests:

Small graphs (2-5 nodes): 26-104% faster due to overhead of set creation being small
Large graphs: Dramatic speedups where the quadratic cost dominates:
- 1000-node linear chain: 319x faster (18.5ms → 57.6μs)
- 1000-node disconnected graph: 293x faster (15.0ms → 50.8μs)
- 100-node fully connected graph: 85x faster (16.8ms → 194μs)
Empty/tiny graphs: Slightly slower (10-20%) due to set creation overhead with minimal computation

When this matters:
This optimization is critical for graphs with many edges or when called frequently in tight loops. The quadratic behavior of the original makes it prohibitively slow for non-trivial graph sizes (100+ nodes/edges), while the optimized version scales linearly.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 30 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Click to see Generated Regression Tests

from __future__ import annotations

# imports
import pytest
from src.algorithms.graph import find_last_node

# unit tests

# 1. Basic Test Cases


def test_single_node_no_edges():
    # One node, no edges: should return the node itself
    nodes = [{"id": 1}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.21μs -> 958ns (26.1% faster)


def test_two_nodes_one_edge():
    # Two nodes, one edge from 1 -> 2: last node is 2
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.79μs -> 1.12μs (59.3% faster)


def test_three_nodes_linear_chain():
    # Three nodes, 1->2->3: last node is 3
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.17μs -> 1.17μs (85.6% faster)


def test_multiple_end_nodes():
    # 1->2, 1->3: both 2 and 3 are valid last nodes, function should return first one found (2)
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}, {"source": 1, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.83μs -> 1.17μs (57.2% faster)


def test_disconnected_nodes():
    # 1->2, 3 is disconnected: 2 and 3 are both last nodes, first in nodes list is 2
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.75μs -> 1.08μs (61.6% faster)


# 2. Edge Test Cases


def test_empty_nodes_and_edges():
    # No nodes, no edges: should return None
    nodes = []
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 750ns -> 833ns (9.96% slower)


def test_edges_without_nodes():
    # Edges reference nodes not in nodes list: should return None
    nodes = []
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 750ns -> 958ns (21.7% slower)


def test_cycle_graph():
    # 1->2, 2->3, 3->1: all nodes have outgoing edges, so should return None
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [
        {"source": 1, "target": 2},
        {"source": 2, "target": 3},
        {"source": 3, "target": 1},
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.21μs -> 1.25μs (76.6% faster)


def test_multiple_edges_from_one_node():
    # 1->2, 1->3, 2->4, 3->5: last nodes are 4 and 5, first in nodes list is 4
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}, {"id": 4}, {"id": 5}]
    edges = [
        {"source": 1, "target": 2},
        {"source": 1, "target": 3},
        {"source": 2, "target": 4},
        {"source": 3, "target": 5},
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.71μs -> 1.33μs (103% faster)


def test_node_with_self_loop():
    # Node with a self-loop should not be considered last node
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.75μs -> 1.08μs (61.6% faster)


def test_duplicate_edges():
    # Duplicate edges should not affect the result
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [
        {"source": 1, "target": 2},
        {"source": 1, "target": 2},  # duplicate
        {"source": 2, "target": 3},
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.25μs -> 1.25μs (80.0% faster)


def test_node_with_no_incoming_edges():
    # Node with no incoming edges but with outgoing edge is not last node
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.79μs -> 1.12μs (59.2% faster)


def test_node_with_no_edges_and_others_with_edges():
    # 1->2, 3 has no edges: 2 and 3 are last nodes, first in nodes list is 2
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.83μs -> 1.12μs (62.9% faster)


def test_nodes_with_non_integer_ids():
    # Node ids as strings
    nodes = [{"id": "a"}, {"id": "b"}, {"id": "c"}]
    edges = [{"source": "a", "target": "b"}, {"source": "b", "target": "c"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.38μs -> 1.17μs (104% faster)


def test_nodes_with_mixed_type_ids():
    # Node ids as int and string, edges refer to correct types
    nodes = [{"id": 1}, {"id": "2"}, {"id": 3}]
    edges = [{"source": 1, "target": "2"}, {"source": "2", "target": 3}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.50μs -> 1.25μs (100% faster)


def test_edges_with_extra_fields():
    # Edges may have extra fields, should be ignored
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2, "weight": 5}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.79μs -> 1.08μs (65.5% faster)


def test_nodes_with_extra_fields():
    # Nodes may have extra fields, should return the full node dict
    nodes = [{"id": 1, "label": "A"}, {"id": 2, "label": "B"}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.71μs -> 1.08μs (57.7% faster)


def test_large_linear_chain():
    # 1000 nodes in a chain: 1->2->3->...->1000, last node is 1000
    nodes = [{"id": i} for i in range(1, 1001)]
    edges = [{"source": i, "target": i + 1} for i in range(1, 1000)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 18.5ms -> 57.6μs (31984% faster)


def test_large_star_graph():
    # 1 connects to 2..1000, so all 2..1000 are last nodes, first in nodes list is 2
    nodes = [{"id": i} for i in range(1, 1001)]
    edges = [{"source": 1, "target": i} for i in range(2, 1001)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 38.8μs -> 20.3μs (91.0% faster)


def test_large_disconnected_nodes():
    # 900 nodes connected in chain, 100 disconnected nodes; first last node is 901
    nodes = [{"id": i} for i in range(1, 1001)]
    edges = [{"source": i, "target": i + 1} for i in range(1, 900)]
    # Nodes 901..1000 are disconnected, so first is 901
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 15.0ms -> 50.8μs (29357% faster)


def test_large_fully_connected_graph():
    # Every node connects to every other node (excluding self-loops)
    nodes = [{"id": i} for i in range(1, 101)]
    edges = [
        {"source": i, "target": j}
        for i in range(1, 101)
        for j in range(1, 101)
        if i != j
    ]
    # All nodes have outgoing edges, so should return None
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 16.8ms -> 194μs (8552% faster)


def test_large_graph_with_multiple_last_nodes():
    # 1->2, 1->3, 1->4, ..., 1->1000: all 2..1000 are last nodes, first is 2
    nodes = [{"id": i} for i in range(1, 1001)]
    edges = [{"source": 1, "target": i} for i in range(2, 1001)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 38.6μs -> 20.2μs (90.9% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from __future__ import annotations

# imports
import pytest  # used for our unit tests
from src.algorithms.graph import find_last_node

# unit tests

# -------------------------
# Basic Test Cases
# -------------------------


def test_single_node_no_edges():
    # One node, no edges: should return the node itself
    nodes = [{"id": 1}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.25μs -> 958ns (30.5% faster)


def test_two_nodes_one_edge():
    # Two nodes, one edge from 1 -> 2: last node is 2
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.79μs -> 1.08μs (65.4% faster)


def test_three_nodes_chain():
    # 1 -> 2 -> 3: last node is 3
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.17μs -> 1.21μs (79.4% faster)


def test_multiple_last_nodes_returns_first():
    # Two end nodes (not connected as sources), should return the first one in nodes
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [{"source": "A", "target": "B"}]  # C is not a source
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.88μs -> 1.17μs (60.7% faster)


def test_all_nodes_are_sources():
    # All nodes are sources, so function should return None
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.83μs -> 1.17μs (57.1% faster)


# -------------------------
# Edge Test Cases
# -------------------------


def test_empty_nodes_and_edges():
    # No nodes, no edges: should return None
    nodes = []
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 750ns -> 875ns (14.3% slower)


def test_nodes_but_no_edges():
    # Multiple nodes, no edges: should return the first node
    nodes = [{"id": "x"}, {"id": "y"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.21μs -> 1.00μs (20.9% faster)


def test_edges_with_missing_nodes():
    # Edges refer to node IDs not in nodes: should still return the first node
    nodes = [{"id": 10}, {"id": 20}]
    edges = [{"source": 30, "target": 10}, {"source": 40, "target": 20}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.50μs -> 1.12μs (33.3% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-find_last_node-mjnhkx3o and push.

The optimized code achieves a **135x speedup** by eliminating a nested loop that was causing O(N×E) time complexity. **What changed:** The original code checked `all(e["source"] != n["id"] for e in edges)` for every node, meaning for each of N nodes, it scanned all E edges. The optimized version preprocesses edges once into a set `edge_sources = {e["source"] for e in edges}`, then performs constant-time lookups `n["id"] not in edge_sources`. **Why this is faster:** - **Set lookups are O(1)** vs. list scans which are O(E) - The algorithm complexity drops from **O(N×E) to O(N+E)** - In Python, the `all()` generator with nested iteration is particularly expensive because it repeats the same edge traversal for every node **Performance characteristics from tests:** - **Small graphs (2-5 nodes)**: 26-104% faster due to overhead of set creation being small - **Large graphs**: Dramatic speedups where the quadratic cost dominates: - 1000-node linear chain: **319x faster** (18.5ms → 57.6μs) - 1000-node disconnected graph: **293x faster** (15.0ms → 50.8μs) - 100-node fully connected graph: **85x faster** (16.8ms → 194μs) - **Empty/tiny graphs**: Slightly slower (10-20%) due to set creation overhead with minimal computation **When this matters:** This optimization is critical for graphs with many edges or when called frequently in tight loops. The quadratic behavior of the original makes it prohibitively slow for non-trivial graph sizes (100+ nodes/edges), while the optimized version scales linearly.

codeflash-ai bot requested a review from KRRT7 December 26, 2025 23:12

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 26, 2025

KRRT7 closed this Jan 25, 2026

KRRT7 deleted the codeflash/optimize-find_last_node-mjnhkx3o branch January 25, 2026 09:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up function `find_last_node` by 13,482%#233

⚡️ Speed up function `find_last_node` by 13,482%#233
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-find_last_node-mjnhkx3o

codeflash-ai bot commented Dec 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

codeflash-ai bot commented Dec 26, 2025

📄 13,482% (134.82x) speedup for find_last_node in src/algorithms/graph.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

📄 13,482% (134.82x) speedup for `find_last_node` in `src/algorithms/graph.py`