refactor(mempool): revert to separate DAGs#1820
refactor(mempool): revert to separate DAGs#1820Mirko-von-Leipzig wants to merge 40 commits intonextfrom
Conversation
44f240d to
8bba7a8
Compare
| // | ||
| // This is done to prevent a system bug from causing repeated failures if we keep retrying | ||
| // the same transactions. Since we can't trivially identify the cause of the block | ||
| // failure, we take the safe route and nuke all associated state. |
There was a problem hiding this comment.
Just wondering if this is actually a trade-off we want to make. There could be a situations where a transient error gets us here and then users need to resubmit transactions right? Which would be a confusing / unexpected UX?
I also don't follow why the rollback_batch() requeues transactions while this doesn't. Why does that make sense and what is the intention?
There was a problem hiding this comment.
Its an arbitrary decision, and we could do either. At the time we decided that block failure was much worse and should be avoided at all costs, whereas batches would be more likely to shuffle the same transactions in different orders.
We do have #594 to address this, and I'm opening a PR hopefully today that basically gives each transactions some number of allowed failures before they are nuked.
There was a problem hiding this comment.
See #1832 which reverts transactions only after they've received four infractions.
| while let Some((id, tx)) = self.inner.selection_candidates().pop_first() { | ||
| if budget.check_then_subtract(tx) == BudgetStatus::Exceeded { | ||
| break; | ||
| } |
There was a problem hiding this comment.
Would it be possible to order tx candidates by size so that a single fat tx doesn't block a series of smaller ones unnecessarily? Unsure if this is a real problem.
There was a problem hiding this comment.
This is a concern; will be addressed once fees are more realistic. In that case we would prioritise txs based on fees vs effort (or some other strategy).
#1242 is also related to this.
There was a problem hiding this comment.
Consider pop_first a placeholder for a more complex selection strategy
| .iter() | ||
| .flat_map(|block| block.iter()) | ||
| .map(|batch| batch.transactions().as_slice().len()) | ||
| .sum::<usize>(); |
There was a problem hiding this comment.
Don't think its a problem because O(blocks*batches) is fine but just noting that this inject_telemetry function appears to be called frequently.
There was a problem hiding this comment.
I'm trying to avoid as much additional book keeping as possible; to ensure we don't desync somewhere. This could be replaced by a counter which is incremented and decremented appropriately though.
Might be worth doing at some point.
e99aecd to
fa897ae
Compare
This PR effectively rewrites the mempool implementation (again). The result is a combination of the first implementation and the current implementation, hopefully resulting in a best of both.
Background
The former was simple -- separate graphs for txs and batches, with nodes moving from one to the next as they are selected. It also had a singular inflight state which was checked when txs were added, but never after again. This meant it was had complex internal transitions which were never checked as nodes moved between graphs. However, within a single graph things were simple, and the graph impl could be shared.
The current implementation has a single graph, with transaction nodes being folded into batch nodes, which fold into block nodes. This graph impl is much more complex, in particular the fold and unfold functions are difficult to grok and test, in part due to pass-through nodes causing DAG properties to no longer hold. The positive is that state cannot really be corrupted since there aren't multiple graphs to make mistakes between.
Why
The driving motivation was to do something about
foldandunfold. I couldn't convince myself that the implementations were ever 100% correct, nor that they would survive changes over time.Eventually I realised that folding and unfolding could be achieved by first reverting all the node subgraphs, and then re-inserting with the new replacement nodes. Reverting and inserting are simple (relatively speaking), and we can lean on the existing insertion and reverting checks as additional assertions during this process.
I then realised there wasn't much need for a single graph anymore, since the original motivation was to support folding and unfolding. However we can use the above to add additional safety to the underlying graph by enforcing that nodes can only ever be reverted if they're leaves, and pruned if they're roots aka either no parents or no children remain in the graph.
This PR
A single graph type is created, which holds its state, and edges based on that state. Nodes can only be appended, reverted (removed from the top/leaves), or pruned (removed from the bottom/roots). Nodes can be marked as selected (to mark a tx/batch as included in batch/block).
Transaction and batch graphs wrap this underlying graph type. This is very similar to the original implementation, with some improvements and additional state checks due to the lessons learnt from the current implementation.
The main
Mempoolfunctions should be much easier to read (imo). This implementation is a pure refactor - user batches, and tx reverting improvement strategies will be in follow-up PRs so they can be reviewed without this huge diff.How to review
There is nearly no current code that applies ito graph. They're also in separate files (which is a positive imo).
I suggest:
mempool; since this business level logic shouldn't changeCloses #1439, closes #1253, closes #537