feat: Optimistic Execution#16581
Conversation
|
I think we need to be especially careful when OE is aborted, because
Maybe to keep it simple we can execute OE only if height > initialHeight. That way we can just clear up the state 100% of the times |
| // processed the first block, as we want to avoid overwriting the finalizeState | ||
| // after state changes during InitChain. | ||
| if req.Height > app.initialHeight { | ||
| // abort any running OE | ||
| if app.oeEnabled && app.oeInfo != nil && app.oeInfo.Running() { | ||
| app.oeInfo.Abort() | ||
| _, _ = app.oeInfo.WaitResult() // ignore the result |
There was a problem hiding this comment.
Change potentially affects state.
Call sequence:
(*github.com/cosmos/cosmos-sdk/baseapp.BaseApp).ProcessProposal (baseapp/abci.go:467)
| oe.logger.Error("OE aborted due to hash mismatch", "oe_hash", hex.EncodeToString(oe.request.Hash), "req_hash", hex.EncodeToString(reqHash), "oe_height", oe.request.Height, "req_height", oe.request.Height) | ||
| oe.cancelFunc() | ||
| return true | ||
| } else if oe.abortRate > 0 && rand.Intn(100) < oe.abortRate { |
Check failure
Code scanning / gosec
Use of weak random number generator (math/rand instead of crypto/rand)
|
I tried cleaning up the concurrency model with little success, now I'm passing a context to internalFinalizeBlock which makes it unaware of the whole OE thing, it uses this context just to know if it needs to return early. Using channels is a cleaner way to do this, but the problem is that the function we are passing as our FinalizeBlockFn has access to baseapp and to its properties, so we have partial control over what's going on with the state. This ended up in getting wrong app hashes from time to time. A great improvement would be to have FinalizeBlock to use only whatever is passed to it and with little to no access to baseapp. This way we could run the entire block completely concurrently by passing only cached copies of the contexts and values; and by the time we abort we just discard all of these. |
| go func() { | ||
| start := time.Now() | ||
| resp, err := oe.finalizeBlockFunc(ctx, oe.request) | ||
| oe.mtx.Lock() | ||
| executionTime := time.Since(start) | ||
| oe.logger.Debug("OE finished", "duration", executionTime.String(), "height", req.Height, "hash", hex.EncodeToString(req.Hash)) | ||
| oe.response, oe.err = resp, err | ||
| close(oe.stopCh) | ||
| oe.mtx.Unlock() | ||
| }() |
Check notice
Code scanning / CodeQL
Spawning a Go routine
| oe.initialized = true | ||
|
|
||
| go func() { | ||
| start := time.Now() |
Check warning
Code scanning / CodeQL
Calling the system time
Co-authored-by: Aleksandr Bezobchuk <alexanderbez@users.noreply.github.com>
|
I've added a check so we don't start optimistic execution when ProcessProposal returns a "reject" |
| voteExtHandler.SetHandlers(bApp) | ||
| } | ||
| baseAppOptions = append(baseAppOptions, voteExtOp) | ||
| baseAppOptions = append(baseAppOptions, voteExtOp, baseapp.SetOptimisticExecution()) |
There was a problem hiding this comment.
Is it purposely done so that simapp only uses optimistic execution?
Or should we add it here:
Line 529 in e2637b8
There was a problem hiding this comment.
So I wouldn't like to add a flag for this, because that could give node operators the chance of not running optimistic execution if the developers chose to do so (they can still modify the code but that takes some extra effort).
Could we add a flag on simapp only? wdyt?
There was a problem hiding this comment.
Yeah, a flag wouldn't make sense.
I was taking about adding it in our default baseapp options.
But I think I remember conversation about having it disabled by default first 🤔, so it makes sense to add it only in simapp.
|
@Mergifyio backport release/v0.50.x |
✅ Backports have been createdDetails
|
Co-authored-by: Aleksandr Bezobchuk <alexanderbez@users.noreply.github.com> (cherry picked from commit 8df065b) # Conflicts: # CHANGELOG.md # baseapp/abci_test.go
This comment was marked as spam.
This comment was marked as spam.
|
Great work @facundomedica 👏 |
Description
RFC: #16499
Closes: #XXXX
Author Checklist
All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.
I have...
!to the type prefix if API or client breaking changeCHANGELOG.mdReviewers Checklist
All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.
I have...
!in the type prefix if API or client breaking change