feat: block producer redesign by Mirko-von-Leipzig · Pull Request #502 · 0xMiden/node

Mirko-von-Leipzig · 2024-09-19T14:51:06Z

This is still a WIP.

Notably, this only demonstrates the transaction pool implementation and even there skips some parts like handling notes and nullifiers.

Does not include batch producer nor block producer; as I think they should be rather straight-forward but depend on getting the pool interface correct.

We should also discuss how we want to integrate this work once its complete. The work itself is fairly non-trivial so it may be best to keep the existing design in-place until this new design has sufficient confidence. aka I'm not going to replace any existing code until we are very ready - probably split across several PRs?

This work is also missing revoking/dropping transactions e.g. due to recency conditions. This is possible to add, but since we don't actually have this feature yet I've left it out.

Design overview & intent

The pool tracks inflight transactions, batches and blocks. The definition of inflight is extended slightly to include all data that we consider not stale. This includes some number of blocks which we know to already be complete and in the store. This is done to minimize race conditions for new transactions inputs from the store (by providing several block's of overlap/grace).

Transactions, batches and blocks are all handled similarly. They're kept in a pool and are only removed once they're considered stale. Dependency graphs are bi-directional i.e. they store both children and parents (except blocks which are always sequential so no need). Once items are removed due to staleness all references to them are removed. This means all items/graph edges should always be in the pool which we enforce using .expect(..) as this would indicate a rather serious bug.

Transaction and batch selection works by tracking what I called "roots". A node is considered a root if all of its ancestors have already been processed e.g. a transaction root's parents are all already part of some inflight batch, and a batch root's parents are all already part of some inflight block.

Outstanding work

I anticipate the need for a lot of tests unless we can find some way of abstracting parts of this. Though I think in general graph book keeping is just painful so we just need to be thorough.

The rest of the state needs to be added, and the block and batch producers need at least a basic implementation so we can modify the pool api accordingly (missing outputs).

We need to pay particular attention to code that modifies the graphs; and should consider more expensive assertations, or runtime audits until we have performance concerns when running live?

Improve naming and comments which I'll get to.

We need to decide what to do in case of failure. Right now we don't really have any options except abort all transactions forming part of the failure?

Given that this is quite a lot of fairly intertwined code, I'm thinking maybe we should split this PR up more once tests are written? e.g. PR 1 includes just add_transaction and its tests and then we build from there.

bobbinth

Looks good! Thank you! I didn't not review all the logic, but did go through the most of the high-level structure and left some small comments inline.

The main comment is that I'd probably prefer to have dedicated structs which encapsulate specific pieces of functionality, and then use these structs in the TransactionPool.

bobbinth · 2024-09-19T21:04:25Z

crates/block-producer/src/tx_pool/mod.rs

+#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
+pub struct BatchId(u64);


For batch ID we can use a hash of all transaction IDs that go into the batch (using some stable order). This way, two batches with the same set of transactions would end up having the same ID.

This would be ideal but I have concerns over how expensive that hash would be? This has to be performed inline meaning it extends the transaction pool lock.

I think this depends on which hash function we use:

If we go with something like BLAKE3, computing a hash of 16 transaction IDs should take less than 1 microsecond.

If we use our arithmetization-friendly hash (RPO) it would be around 50 microseconds (or around 25 microseconds if/when we migrate to RPX hash function).

We could probably start with BLAKE3 and switch to RPO/RPX only if needed, but I think either way it may be OK as my expectation was that batch selection will probably take on the order of 1 millisecond anyways.

I've left this as is still. I think we may want to separate the concept of batch ID and batch job ID for the situation where we drop a batch and re-create the exact same one. This would result in two "inflight" batches with the same ID.

Maybe this is fine; will have to think about it some more. For now both count and hashed are still around.

I think we may want to separate the concept of batch ID and batch job ID for the situation where we drop a batch and re-create the exact same one. This would result in two "inflight" batches with the same ID.

As far as I can tell, this shouldn't happen because we can't put the same transaction into two different batches at the same time (or at least we shouldn't). So, we could have two batches with the same ID - but it shouldn't be at the same time.

My preference is still to use a hash-based ID as I think it is conceptually a bit simpler and could help us catch some errors (i.e., if we do detect 2 batches with the same ID in flight, something probably went wrong).

An example:

Two inflight batches N and N+1 where N+1 depends on N.

Batch N fails to prove and we inform the mempool of this. The mempool requeues all transactions from N and N+1.

Select two new batches and we get N and N+1 again. We now have three batches inflight, with a duplicate N+1.

Note that the batch producer has no idea of any dependencies so it cannot automatically drop the original N+1.

crates/block-producer/src/tx_pool/mod.rs

Mirko-von-Leipzig · 2024-09-25T16:28:46Z

I've refactored quite a bit; and I've added implementations for the rpc add_transaction and a batch producer with a worker pool.

There is still quite a bit missing. Notably tests, checking all inputs, and the block producer implementation.

bobbinth

Looks good! Thank you! This is not a full review, but I did leave some comments inline - mostly minor though. I think the biggest remaining thing is to incorporate note/nullifier tracking as this may affect the structure a bit.

crates/block-producer/src/mempool/mod.rs

crates/block-producer/src/mempool/transaction_graph.rs

crates/block-producer/src/mempool/mod.rs

crates/block-producer/src/mempool/batch_graph.rs

In anticipation of creating tx, batch and block pools.

Separate error types from error representation on the wire by splitting the code into an implementation and a wrapper which performs the RPC conversions. This separation gives us a single place to map certain errors to internal errors etc.

Essentially just a clone of the existing block builder with the mempool addition.

This includes extracting account state and a half-baked transaction purging.

In favor of implementing the details in other PRs.

Mirko-von-Leipzig · 2024-10-04T14:50:52Z

crates/block-producer/src/batch_builder/mod.rs

This is just an example; more scrutiny should be applied when we fill in the details.

Mirko-von-Leipzig · 2024-10-04T14:51:05Z

crates/block-producer/src/block_builder/mod.rs

Just an example.

Mirko-von-Leipzig · 2024-10-04T14:55:07Z

crates/block-producer/src/mempool/mod.rs

+#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
+pub struct BatchJobId(u64);
+
+impl Display for BatchJobId {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        self.0.fmt(f)
+    }
+}
+
+impl BatchJobId {
+    pub fn increment(mut self) {
+        self.0 += 1;
+    }
+}


I'm still unsure about using the hash as ID (this comment). We can punt that discussion to when we implement the batch building though.

Mirko-von-Leipzig · 2024-10-04T15:00:30Z

crates/block-producer/src/mempool/inflight_state.rs

+    /// The latest committed state.
+    ///
+    /// Only valid if the committed count is greater than zero.
+    committed_state: Digest,
+
+    /// The number of committed transactions.
+    ///
+    /// If this is zero then the committed state is meaningless.
+    committed_count: usize,


Alternatively we can just move this back into a VecDequeue.

Mirko-von-Leipzig · 2024-10-04T15:02:20Z

crates/block-producer/src/mempool/inflight_state.rs

+    /// The number of transactions that affected each account.
+    account_transactions: BTreeMap<AccountId, usize>,


We could do additional checks here if we stored the actual state transitions.

Mirko-von-Leipzig · 2024-10-04T15:03:12Z

crates/block-producer/src/mempool/mod.rs

+    /// Transactions are returned in a valid execution ordering.
+    ///
+    /// Returns `None` if no transactions are available.
+    pub fn select_batch(&mut self) -> Option<(BatchJobId, Vec<TransactionId>)> {


This intentionally returns IDs for now.

Mirko-von-Leipzig · 2024-10-04T15:03:35Z

crates/block-producer/src/mempool/mod.rs

+    /// # Panics
+    ///
+    /// Panics if there is already a block in flight.
+    pub fn select_block(&mut self) -> (BlockNumber, BTreeSet<BatchJobId>) {


Intentionally returns IDs (for now).

Mirko-von-Leipzig · 2024-10-04T15:08:39Z

Mirko-von-Leipzig · 2024-10-04T15:43:51Z

crates/block-producer/src/mempool/mod.rs

+}
+
+#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
+pub struct BlockNumber(u32);


This probably belongs in base? Something else I've been itching to do is new-type all of the different fields in the block header for example. Having all of them be Digest/u32 isn't great.

Yes, I think moving this to miden-base would be a good idea. Would you like to make a PR there?

For the other fields - also agreed, but maybe we do these incrementally over time?

Will do. And incrementally is good; no rush :) One can also over do it.

bobbinth

Looks good! Thank you! Not a very thorough review from me - but I did leave a few comments inline (mostly for the future).

crates/block-producer/src/server/mod.rs

bobbinth · 2024-10-07T01:41:38Z

crates/block-producer/src/block_builder/mod.rs

+    }
+
+    async fn build_and_commit_block(&self, batches: BTreeSet<BatchJobId>) -> Result<(), ()> {
+        todo!("Aggregate, prove and commit block");


This will contain building the block and saving it to the store, right?

For block building, since we don't actually prove anything yet, I am thinking we could put in an artificial delay of 3 - 5 seconds to get the conditions close to the real thing.

Correct. I was also considering injecting failures randomly as mentioned here.

bobbinth · 2024-10-07T01:47:02Z

crates/block-producer/src/batch_builder/mod.rs

+
+    fn spawn(&mut self, id: BatchJobId, transactions: Vec<TransactionId>) {
+        self.0.spawn(async move {
+            todo!("Do actual work like aggregating transaction data");


Similar to the above comment: batch building could also take 3 - 5 seconds. Let's artificially simulate this for now.

I was wondering if we should also simulate failures. So something like sleep a random period, and then randomly fail a batch every now and then.

If we do want this, the failure rate should probably be a configurable parameter. Similar for the entire block.

Though this might annoy users if this causes a user tx to expire. And could look bad ito throughput; though I don't think that's an issue right now (?).

If this is configurable and not too difficult to add, I don't mind doing it.

Initial outline of the block-producer redesign. Includes a new mempool which can track transaction and batch dependencies as a graph. This enables more complex mempool strategies.

Mirko-von-Leipzig added the no changelog This PR does not require an entry in the `CHANGELOG.md` file label Sep 19, 2024

Mirko-von-Leipzig requested a review from bobbinth September 19, 2024 15:47

bobbinth reviewed Sep 19, 2024

View reviewed changes

Mirko-von-Leipzig marked this pull request as draft September 23, 2024 12:49

Mirko-von-Leipzig marked this pull request as ready for review September 25, 2024 16:23

Mirko-von-Leipzig requested a review from bobbinth September 25, 2024 16:23

Mirko-von-Leipzig force-pushed the mirko-block-producer-redesign branch from 620463e to 05e4e75 Compare September 26, 2024 07:08

bobbinth reviewed Sep 28, 2024

View reviewed changes

Mirko-von-Leipzig mentioned this pull request Sep 30, 2024

Support transaction expiration #508

Closed

Mirko-von-Leipzig added 20 commits October 4, 2024 08:55

wip draft

ad707df

end of day

555117a

Final form (pfft probably not)

15d2430

fmt

256c339

Missing update of parent batch

3eae7f1

Rename module tx_pool to pool.

e9eee63

In anticipation of creating tx, batch and block pools.

Rename to Mempool so that tx pool becomes available

838da64

Rename pool to mempool module

9d28bbf

wip transaction graph

a8601f2

Use batch graph

27c5c5a

Rename file

b80d017

Improve staleness book keeping

069b5e1

Example block producer usage

ecdaf8f

Incomplete batch producer impl

3553007

fmt

b19957a

Naming

7063823

Fix stray tick

924e7b1

Batch job now uses transactions, notes still missing.

c80a19d

Rename BatchId -> BatchJobId to distinguish with batch hash

b5eb7be

Improve submit transaction error handling.

027dd30

Separate error types from error representation on the wire by splitting the code into an implementation and a wrapper which performs the RPC conversions. This separation gives us a single place to map certain errors to internal errors etc.

Mirko-von-Leipzig added 6 commits October 4, 2024 08:55

Add basic block builder

f029b2f

Essentially just a clone of the existing block builder with the mempool addition.

Only allow a single inflight block

76c9912

Implement block failure.

474704e

This includes extracting account state and a half-baked transaction purging.

Move limits into mempool

5476f22

Refactor account state to allow committed state.

e65cf53

Remove outdated comment about unsupported multiple-in-lfight txs

2cdabbc

Mirko-von-Leipzig changed the base branch from next to next-block-producer October 4, 2024 06:56

Mirko-von-Leipzig marked this pull request as draft October 4, 2024 06:56

Mirko-von-Leipzig added 8 commits October 4, 2024 12:00

Revert to usings IDs only

2479db3

In favor of implementing the details in other PRs.

Only select independent batches for a block

acbfa64

Support nullifiers in state

b6590c5

Rename state module to inflight_state

5008fdd

Move error to error module

b0645d3

Move state diff to state module

7957553

Minor rewording

349685c

fmt

45627f3

Mirko-von-Leipzig force-pushed the mirko-block-producer-redesign branch from 05e4e75 to 45627f3 Compare October 4, 2024 14:35

Mirko-von-Leipzig commented Oct 4, 2024

View reviewed changes

Mirko-von-Leipzig marked this pull request as ready for review October 4, 2024 15:08

Mirko-von-Leipzig commented Oct 4, 2024

View reviewed changes

bobbinth approved these changes Oct 7, 2024

View reviewed changes

Leave tx proof verification to the rpc component

4ba9f24

Mirko-von-Leipzig merged commit ed72500 into next-block-producer Oct 7, 2024

Mirko-von-Leipzig deleted the mirko-block-producer-redesign branch October 7, 2024 11:05

This was referenced Oct 7, 2024

[block-producer]: redesign tracking issue #514

Closed

[block-producer]: implement a new skeleton #491

Closed

Mirko-von-Leipzig mentioned this pull request Oct 17, 2024

feat(block-producer): batch builder #515

Merged

12 tasks

		#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
		pub struct BatchId(u64);

		/// The number of transactions that affected each account.
		account_transactions: BTreeMap<AccountId, usize>,

Conversation

Mirko-von-Leipzig commented Sep 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design overview & intent

Outstanding work

Uh oh!

bobbinth left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mirko-von-Leipzig commented Sep 25, 2024

Uh oh!

bobbinth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mirko-von-Leipzig commented Oct 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notable changes from last review

Included in this PR

Excluded

What's next

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bobbinth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mirko-von-Leipzig commented Sep 19, 2024 •

edited

Loading

Mirko-von-Leipzig commented Oct 4, 2024 •

edited

Loading