refactor(mempool): revert to separate DAGs by Mirko-von-Leipzig · Pull Request #1820 · 0xMiden/node

Mirko-von-Leipzig · 2026-03-23T15:34:31Z

This PR effectively rewrites the mempool implementation (again). The result is a combination of the first implementation and the current implementation, hopefully resulting in a best of both.

Background

The former was simple -- separate graphs for txs and batches, with nodes moving from one to the next as they are selected. It also had a singular inflight state which was checked when txs were added, but never after again. This meant it was had complex internal transitions which were never checked as nodes moved between graphs. However, within a single graph things were simple, and the graph impl could be shared.

The current implementation has a single graph, with transaction nodes being folded into batch nodes, which fold into block nodes. This graph impl is much more complex, in particular the fold and unfold functions are difficult to grok and test, in part due to pass-through nodes causing DAG properties to no longer hold. The positive is that state cannot really be corrupted since there aren't multiple graphs to make mistakes between.

Why

The driving motivation was to do something about fold and unfold. I couldn't convince myself that the implementations were ever 100% correct, nor that they would survive changes over time.

Eventually I realised that folding and unfolding could be achieved by first reverting all the node subgraphs, and then re-inserting with the new replacement nodes. Reverting and inserting are simple (relatively speaking), and we can lean on the existing insertion and reverting checks as additional assertions during this process.

I then realised there wasn't much need for a single graph anymore, since the original motivation was to support folding and unfolding. However we can use the above to add additional safety to the underlying graph by enforcing that nodes can only ever be reverted if they're leaves, and pruned if they're roots aka either no parents or no children remain in the graph.

This PR

A single graph type is created, which holds its state, and edges based on that state. Nodes can only be appended, reverted (removed from the top/leaves), or pruned (removed from the bottom/roots). Nodes can be marked as selected (to mark a tx/batch as included in batch/block).

Transaction and batch graphs wrap this underlying graph type. This is very similar to the original implementation, with some improvements and additional state checks due to the lessons learnt from the current implementation.

The main Mempool functions should be much easier to read (imo). This implementation is a pure refactor - user batches, and tx reverting improvement strategies will be in follow-up PRs so they can be reviewed without this huge diff.

How to review

There is nearly no current code that applies ito graph. They're also in separate files (which is a positive imo).

I suggest:

Review mempool; since this business level logic shouldn't change
- It should be correct wrt to the claims made in the batch and transaction top-level graphs.
Review the state and edge files, then merge that logically with the assumptions made in the graph impl.

Closes #1439, closes #1253, closes #537

crates/block-producer/src/mempool/mod.rs

sergerad · 2026-03-24T02:06:57Z

crates/block-producer/src/mempool/mod.rs

        //
        // This is done to prevent a system bug from causing repeated failures if we keep retrying
        // the same transactions. Since we can't trivially identify the cause of the block
        // failure, we take the safe route and nuke all associated state.


Just wondering if this is actually a trade-off we want to make. There could be a situations where a transient error gets us here and then users need to resubmit transactions right? Which would be a confusing / unexpected UX?

I also don't follow why the rollback_batch() requeues transactions while this doesn't. Why does that make sense and what is the intention?

Its an arbitrary decision, and we could do either. At the time we decided that block failure was much worse and should be avoided at all costs, whereas batches would be more likely to shuffle the same transactions in different orders.

We do have #594 to address this, and I'm opening a PR hopefully today that basically gives each transactions some number of allowed failures before they are nuked.

See #1832 which reverts transactions only after they've received four infractions.

sergerad · 2026-03-24T02:29:38Z

crates/block-producer/src/mempool/graph/transaction.rs

+        while let Some((id, tx)) = self.inner.selection_candidates().pop_first() {
+            if budget.check_then_subtract(tx) == BudgetStatus::Exceeded {
+                break;
+            }


Would it be possible to order tx candidates by size so that a single fat tx doesn't block a series of smaller ones unnecessarily? Unsure if this is a real problem.

This is a concern; will be addressed once fees are more realistic. In that case we would prioritise txs based on fees vs effort (or some other strategy).

#1242 is also related to this.

Consider pop_first a placeholder for a more complex selection strategy

crates/block-producer/src/mempool/graph/dag.rs

crates/block-producer/src/mempool/graph/batch.rs

sergerad · 2026-03-24T02:49:00Z

crates/block-producer/src/mempool/mod.rs

+            .iter()
+            .flat_map(|block| block.iter())
+            .map(|batch| batch.transactions().as_slice().len())
+            .sum::<usize>();


Don't think its a problem because O(blocks*batches) is fine but just noting that this inject_telemetry function appears to be called frequently.

I'm trying to avoid as much additional book keeping as possible; to ensure we don't desync somewhere. This could be replaced by a counter which is incremented and decremented appropriately though.

Might be worth doing at some point.

crates/block-producer/src/mempool/graph/dag.rs

Mirko-von-Leipzig added 30 commits March 16, 2026 16:04

An outline

9f9619d

Impl graph skeleton

c1e8295

Impl batch skeletons

bfd590c

Delete vestigial

7ce6f61

Mempool::add_transaction

ed8b0b5

Mempool::PartialEq

62f2637

Mempool::select_batch

8ac5241

Mempool::rollback_batch

41dc2f2

Mempool::commit_batch

a6f6c6b

Mempool::select_block

4f82e3a

Mempool::pending_block

b0f568d

Mempool::prune

9c1cb0e

Mempool::revert_expired

b838067

Mempool::rollback_block

be61536

Mempool::authentication_staleness_check

28e8890

Drop old graph impl

6fcfb8d

Track selected internally in graph

cdcae37

Move append checks into graph

51f9e99

Implement pruning

ffdaea7

Implement descendents

cda6296

Don't pop roots

0dbcd5f

Consider selected for expiration reversion

e049301

Requeue transactions

94aa884

Some telemetry

682c8af

Fix account state tracking

a71d097

Fix reversion

a197fe6

Lints

8a1b7fb

Move account states into separate file

14fee6c

Submodules

c3fa580

Fix reverts

c96e116

Mirko-von-Leipzig added 9 commits March 19, 2026 13:10

flatten modules

48103a4

Make selection candidates explicit

73d3c00

Fix batch word commitment

7c5ed5d

AI suggestions

424a5bc

Re-enable lints

93dd09d

Update errors

a2097dd

Fix bugs found by tests

348a528

Re-add telemetry

0334290

AI tests

8bba7a8

Mirko-von-Leipzig force-pushed the mirko/mempool-v4 branch from 44f240d to 8bba7a8 Compare March 23, 2026 15:38

Mirko-von-Leipzig requested review from bobbinth, igamigo and sergerad and removed request for igamigo March 23, 2026 15:38

Mirko-von-Leipzig added the no changelog This PR does not require an entry in the `CHANGELOG.md` file label Mar 23, 2026

Mirko-von-Leipzig marked this pull request as ready for review March 23, 2026 15:39