Skip to content

[BFT] Extend CombinedVoteProcessorV3 #7918

Merged
durkmurder merged 51 commits intomasterfrom
yurii/6127-vote-double-counting
Dec 3, 2025
Merged

[BFT] Extend CombinedVoteProcessorV3 #7918
durkmurder merged 51 commits intomasterfrom
yurii/6127-vote-double-counting

Conversation

@durkmurder
Copy link
Member

https://github.com/onflow/flow-go-internal/issues/6127

Context

This PR introduces a vote cache for CombinedVoteProcessorV3. Previously CombinedVoteProcessorV3 couldn't guarantee correctness of the QC it provides under some scenarios when leader is equivocating. By using a separate vote cache for the processor we can guarantee that each vote will be included at most once per signer which inherently guarantees that leader will be able to construct a valid QC once he collects enough votes to form a super-majority.

The runtime overhead of introducing votesCache is marginal: For votesCache to add 500 votes (concurrently with 20 threads) takes about 0.25ms. This runtime overhead is negligible and a good trade-off for the gain in maintainability and code clarity.

@codecov-commenter
Copy link

codecov-commenter commented Sep 18, 2025

Codecov Report

❌ Patch coverage is 82.85714% with 12 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
consensus/hotstuff/votecollector/statemachine.go 72.72% 7 Missing and 5 partials ⚠️

📢 Thoughts on this report? Let us know!

err := s.collector.ProcessBlock(proposal)
require.NoError(s.T(), err)

time.Sleep(100 * time.Millisecond)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we stop the worker pool here instead of sleeping?

s.workerPool.StopWait()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +1101 to +1103
onQCCreated := func(qc *flow.QuorumCertificate) {
require.Fail(t, "qc is not expected to be created in this test scenario")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding 2f-1 valid votes to the vote processor, then adding the 2 duplicate votes, would make this a stronger check. Currently we're adding 2 votes (only one of which is valid), but the threshold for QC creation is 3, so we wouldn't expect a QC even if the duplicate vote was accepted.

If we added 1 valid vote from some other node, then 2 duplicate votes from the leader, then it is more likely that a faulty QC creation process would incorrectly create an invalid QC.

Copy link
Member

@jordanschalm jordanschalm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great

// It gets more interesting when any of the operations fail after we add the vote to the cache. The way this logic is structured
// it results in the following rule: _only the very first vote that was added to the votesCache from the same signer will be processed by the component_.
// It means that if the vote was invalid by any reason from the point of view of the aggregator the second vote won't be processed at all
// even if it might be correct from the pointer of view of the aggregator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// even if it might be correct from the pointer of view of the aggregator.
// even if it might be correct from the point of view of the aggregator.

Copy link
Member

@AlexHentschel AlexHentschel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. This is so much of an improvement in clarity -- really think that is worth the time investment in the long run. Only a small amount of non-critical suggestions.

Copy link
Member

@AlexHentschel AlexHentschel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for removing the mutex. I think CAS must be used in caching2Verifying for correctness (unless we are relying on guarantees outside of that method, which I did not check). For consistency and clarity, I would suggest to also implement the CAS Loop pattern in terminateVoteProcessing.

Suggested documented, drop-in code code replacements in my comments.

@durkmurder durkmurder added this pull request to the merge queue Dec 3, 2025
Merged via the queue into master with commit 7e0964b Dec 3, 2025
61 checks passed
@durkmurder durkmurder deleted the yurii/6127-vote-double-counting branch December 3, 2025 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants