Skip to content

fix: introduce mutex for state and lastCommitInfo to avoid race conditions#22692

Closed
beer-1 wants to merge 19 commits into
cosmos:mainfrom
initia-labs:fix/race
Closed

fix: introduce mutex for state and lastCommitInfo to avoid race conditions#22692
beer-1 wants to merge 19 commits into
cosmos:mainfrom
initia-labs:fix/race

Conversation

@beer-1
Copy link
Copy Markdown
Contributor

@beer-1 beer-1 commented Nov 29, 2024

Description

Closes: #22650


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title, you can find examples of the prefixes below:
  • confirmed ! in the type prefix if API or client breaking change
  • targeted the correct branch (see PR Targeting)
  • provided a link to the relevant issue or specification
  • reviewed "Files changed" and left comments if necessary
  • included the necessary unit and integration tests
  • added a changelog entry to CHANGELOG.md
  • updated the relevant documentation or specification, including comments for documenting Go code
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

Please see Pull Request Reviewer section in the contributing guide for more information on how to review a pull request.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic, API design and naming, documentation is accurate, tests and test coverage

Summary by CodeRabbit

  • New Features

    • Integrated support for app version 2.
    • Enabled simulation of nested messages.
    • Introduced Linux-specific secure key management and hex key import via standard input.
    • Added custom public key verification for improved transaction validation.
  • Improvements

    • Enhanced concurrency and state management for smoother performance.
    • Optimized handling of edge cases and upgraded supporting libraries for better reliability.
  • Bug Fixes

    • Resolved race conditions affecting transaction processing for increased stability.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Nov 29, 2024

📝 Walkthrough

Walkthrough

This update integrates several new features and improvements across core components. New integrations include app v2 support, simulation of nested messages, a Linux-only crypto backend, hex key import via standard input, and custom public key verification. The BaseApp code has been refactored to centralize state management through a new getState method with added mutex locks and concurrency enhancements. Additionally, testing has been improved to cover concurrent state access scenarios, and the store module now manages commit information atomically. A module replacement directive and dependency downgrades are also introduced.

Changes

File(s) Change Summary
CHANGELOG.md Documents new features (app v2 integration, simulating nested messages, Linux crypto backend, hex key import, custom key verification), improvements (edge case and RocksDB upgrades, integration test refinements), and bug fixes (mutex locks and data race resolutions).
baseapp/abci.go, baseapp/abci_test.go, baseapp/baseapp.go, baseapp/test_helpers.go Refactored BaseApp to centralize state management via a new getState method with added mutex locks, introduced a new test function for concurrency (race conditions between commit and query context), and updated method implementations to safely handle state transitions.
go.mod Added a replace directive to redirect cosmossdk.io/store to a local path (./store) within the temporary replaces section.
store/rootmulti/store.go, store/go.mod Updated the Store struct by changing the lastCommitInfo to an atomic.Pointer for thread-safe commit info management and downgraded dependency versions for github.com/cosmos/iavl and github.com/hashicorp/go-plugin.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client
    participant BA as BaseApp
    participant GS as getState(execMode)
    participant S as State Object

    C->>BA: Send request (e.g., FinalizeBlock, CheckTx)
    BA->>GS: Retrieve state based on execution mode
    GS-->>BA: Return appropriate state
    BA->>S: Process request with thread-safe state access
    S-->>BA: Acknowledge state operation
    BA->>C: Return response
Loading

Possibly related PRs

Suggested labels

C:server/v2, C:server/v2 cometbft, backport/v0.52.x

Suggested reviewers

  • sontrinh16
  • kocubinski
  • julienrbrt
  • tac0turtle
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@beer-1 beer-1 changed the title fix: introduce mutex for state and lastCommitInfo to avoid race fix: introduce mutex for state and lastCommitInfo to avoid race conditions Nov 29, 2024
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Outside diff range and nitpick comments (13)
go.mod (1)

218-219: Consider documenting the temporary nature of this replace directive.

Since this replace directive is part of the race condition fix, consider moving it to the temporary replace section with a comment explaining its purpose.

 // Here are the short-lived replace from the Cosmos SDK
 // Replace here are pending PRs, or version to be tagged
-// replace (
-// 	<temporary replace>
-// )
+replace (
+    // Temporary: Using local store module for race condition fix (#22650)
+    cosmossdk.io/store => ./store
+)
CHANGELOG.md (4)

Line range hint 1-1: Add version table at the top of changelog

Consider adding a version table at the top that summarizes all v0.46.x releases with their dates and key changes for easier reference.


Line range hint 11-13: Fix inconsistent bullet point formatting

The bullet points use a mix of * and -. Standardize on one format throughout the document for consistency.


59-61: Add impact details for breaking changes

For breaking changes like the Dragonberry security fix, consider adding more details about potential impact on users and required actions.


Line range hint 1-1000: Fix typos and grammatical errors

Several typos and grammatical errors found throughout the document:

  • "typographical" misspelled as "typograhical"
  • Missing periods at end of some bullet points
  • Inconsistent capitalization in section headers
store/rootmulti/store.go (1)

63-63: Naming convention for mutex variables.

Consider renaming lastCommitInfoMut to lastCommitInfoMu to align with Go conventions, where mutex variables are typically suffixed with Mu.

baseapp/baseapp.go (2)

127-127: Consider renaming stateMut to stateMutex for clarity.

According to the Uber Go Style Guide, abbreviations in variable names should be avoided to enhance readability. Renaming stateMut to stateMutex makes the purpose of the mutex clearer.

Apply this diff to improve clarity:

-	stateMut             sync.RWMutex
+	stateMutex           sync.RWMutex

Line range hint 583-586: Potential deadlock due to inconsistent lock ordering between app.mu and app.stateMut.

In getContextForTx, app.mu is locked before calling app.getState(mode), which in turn locks app.stateMut. If elsewhere app.stateMut is locked before attempting to acquire app.mu, this could lead to a deadlock due to inconsistent lock ordering.

Recommend reviewing the locking order to ensure that app.mu and app.stateMut are always acquired in a consistent order throughout the codebase to prevent potential deadlocks.

baseapp/abci.go (3)

609-609: Handle possible errors from CacheContext

In the assignments:

ctx, _ = app.getState(execModeFinalize).Context().CacheContext()

Ignoring the second return value without handling could overlook potential issues. Although CacheContext() may not currently return an error, future changes could introduce errors.

Consider capturing both return values explicitly for clarity:

ctx, ms := app.getState(execModeFinalize).Context().CacheContext()
// use ms if needed

Also applies to: 689-689


747-747: Clarify the comment for better understanding

The comment at line 747 is unclear:

// only used to handle early cancellation, for anything related to state app.getState(execModeFinalize).Context()

Consider rephrasing for clarity:

// Use ctx only for early cancellation handling. For state-related operations, use app.getState(execModeFinalize).Context()

1194-1194: Review the use of CacheContext

In:

ctx, _ = app.getState(execModeFinalize).Context().CacheContext()

Ensure that caching the context here is intended and that any modifications do not affect the main context unintentionally.

baseapp/abci_test.go (2)

2783-2830: Add a function comment to describe the purpose of the test

To improve code readability and maintainability, consider adding a comment before the TestABCI_Race_Commit_Query function to explain its purpose and what it is testing.


2796-2797: Simplify atomic counter initialization

Since the zero value of atomic.Uint64 is zero, the explicit call to counter.Store(0) is unnecessary. You can simplify the code by removing this call.

Apply the following diff:

-counter := atomic.Uint64{}
-counter.Store(0)
+var counter atomic.Uint64
📜 Review details

Configuration used: .coderabbit.yml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 6cfe2dc and 298d02e.

📒 Files selected for processing (7)
  • CHANGELOG.md (1 hunks)
  • baseapp/abci.go (21 hunks)
  • baseapp/abci_test.go (2 hunks)
  • baseapp/baseapp.go (7 hunks)
  • baseapp/test_helpers.go (1 hunks)
  • go.mod (1 hunks)
  • store/rootmulti/store.go (5 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
CHANGELOG.md (1)

Pattern **/*.md: "Assess the documentation for misspellings, grammatical errors, missing documentation and correctness"

baseapp/abci.go (1)

Pattern **/*.go: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.

baseapp/abci_test.go (2)

Pattern **/*.go: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.


Pattern **/*_test.go: "Assess the unit test code assessing sufficient code coverage for the changes associated in the pull request"

baseapp/baseapp.go (1)

Pattern **/*.go: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.

baseapp/test_helpers.go (1)

Pattern **/*.go: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.

store/rootmulti/store.go (1)

Pattern **/*.go: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.

🔇 Additional comments (20)
baseapp/test_helpers.go (2)

47-47: LGTM: Thread-safe state access implemented

The change from direct state access to using getState aligns with the PR's objective of preventing race conditions. This modification ensures thread-safe access to the state during simulations.

Let's verify the thread safety implementation:

✅ Verification successful

Thread-safe state access confirmed via RLock/RUnlock

The verification confirms that getState method properly implements thread-safety using RLock/RUnlock from stateMut. The implementation correctly protects state access across different execution modes, including the execModeFinalize used in the test helper.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that getState method uses proper synchronization
# Expected: Find mutex usage in getState implementation

ast-grep --pattern 'func (app *BaseApp) getState($_) $_ {
  $$$
}'

Length of output: 831


53-58: Thread-safe state access implemented, verify performance impact

The changes correctly implement thread-safe state access using getState for both check and finalize modes. This aligns with fixing the race conditions identified in issue #22650.

Let's verify the usage patterns and potential performance impact:

✅ Verification successful

Based on the search results, I can now provide a final response since I have enough information about the usage patterns of NewContextLegacy:

Thread safety changes are safe with minimal performance impact

The changes to NewContextLegacy using getState are safe because:

  • The method is primarily used in test code and initialization paths, not in hot execution paths
  • Main usage patterns are in:
    • Test helpers and test suites
    • App initialization and genesis setup
    • Export functionality
    • Simulation framework

The added synchronization through getState won't cause performance issues since this method isn't called in performance-critical paths or high-frequency operations.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for high-frequency calls to NewContextLegacy that might be affected by the added synchronization
# Expected: Identify critical paths that might need optimization

# Find direct calls to NewContextLegacy
rg "NewContextLegacy\(" -A 2

# Find potential high-frequency callers through NewContext
rg "NewContext\(" -A 2

Length of output: 40662

go.mod (1)

219-219: LGTM! Verify version consistency with other local modules.

The addition of the store module replace directive aligns with the existing pattern and is necessary for implementing the race condition fix. However, we should ensure version consistency.

✅ Verification successful

Let me verify the version consistency further by checking other local module replacements.


Version consistency verified across local module replacements

The replace directive for the store module follows the same pattern as other local module replacements (api, bank, staking, tx). All local modules use Go 1.23, and the version specified in the main go.mod (v1.1.1-0.20240418092142-896cdf1971bc) is consistent with the development nature of these changes.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check version consistency between go.mod and local store module
# Verify the store module's go.mod version matches the version specified in the main go.mod

# Get the version from main go.mod
echo "Main module store version:"
grep "cosmossdk.io/store" go.mod | head -n 1

# Get the version from local store module
echo "Local store module version:"
if [ -f "store/go.mod" ]; then
    grep "module cosmossdk.io/store" store/go.mod
fi

Length of output: 329


Script:

#!/bin/bash
# Check all local module replacements and their versions
echo "Main go.mod versions:"
grep "cosmossdk.io/" go.mod | grep -v "replace" | sort

echo -e "\nReplace directives:"
grep "replace cosmossdk.io/" go.mod | sort

echo -e "\nLocal module versions:"
for dir in api bank staking tx store; do
    if [ -f "$dir/go.mod" ]; then
        echo -e "\n$dir/go.mod:"
        grep "^module" "$dir/go.mod"
        grep "^go 1." "$dir/go.mod"
    fi
done

Length of output: 1573

store/rootmulti/store.go (7)

63-63: Introduction of lastCommitInfoMut for concurrency control.

Adding lastCommitInfoMut sync.RWMutex to the Store struct appropriately introduces a mutex to prevent race conditions when accessing lastCommitInfo.


292-294: Proper locking around lastCommitInfo assignment in loadVersion().

Locking lastCommitInfoMut before updating rs.lastCommitInfo ensures thread-safe write operations.


440-446: Consistent use of LastCommitInfo() in LatestVersion().

Updating LatestVersion() to use rs.LastCommitInfo() ensures safe concurrent access to lastCommitInfo.


448-451: Thread-safe access to lastCommitInfo via LastCommitInfo() method.

The new LastCommitInfo() method correctly implements read locking to allow safe concurrent reads of lastCommitInfo.


456-474: Safe access to lastCommitInfo in LastCommitID().

Using rs.LastCommitInfo() in LastCommitID() prevents data races during read operations of lastCommitInfo.


518-521: Ensure minimal locking when updating lastCommitInfo.Timestamp.

Holding the lock only during the assignment to lastCommitInfo.Timestamp is acceptable. Confirm that no other operations within the lock could cause delays or potential deadlocks.


800-802: Thread-safe retrieval of lastCommitInfo in Query().

Using rs.LastCommitInfo() in the Query() method ensures safe concurrent access and prevents race conditions.

baseapp/abci.go (9)

72-72: Good use of encapsulation with app.getState

The introduction of app.getState(execModeFinalize) enhances state management by encapsulating state retrieval, improving code maintainability and readability.


78-78: Ensure req.ConsensusParams is validated

While storing consensus parameters, make sure that req.ConsensusParams is not nil and contains valid data to prevent potential runtime errors.


90-97: Consistent Context Update for States

Updating both checkState and finalizeState contexts ensures consistency across different execution modes. This is a good practice for maintaining state integrity.


110-110: Initialization of Infinite Gas Meter

Setting an infinite gas meter during InitChain is appropriate to allow unrestricted operations during genesis. Ensure that this does not inadvertently persist into states where gas limits should be enforced.


936-939: Ensure proper mutex usage when modifying shared state

Locking app.stateMut before modifying app.finalizeBlockState is correct. Ensure that all other accesses to shared state variables are similarly protected to prevent race conditions.


1024-1026: Mutex locking and unlocking

Proper use of app.stateMut.Lock() and app.stateMut.Unlock() ensures thread-safe operations when modifying shared states.


1047-1047: Ensure ms.Write() is safe and necessary

Calling ms.Write() writes pending changes to the parent store. Confirm that this call is necessary and that any potential errors are handled appropriately.


1409-1409: Verify consensus parameters retrieval

Retrieving consensus parameters using app.getState(execModeFinalize).Context() should be thread-safe and consistent with the application's state at this point.


978-979: ⚠️ Potential issue

Accessing finalizeState without synchronization

Accessing finalizeState here may lead to race conditions if not properly synchronized. Verify that app.stateMut is held during this access.

Run the following script to identify unsynchronized accesses to app.getState:

Also applies to: 983-983

baseapp/abci_test.go (1)

2796-2797: Verify Go version compatibility for atomic.Uint64

The atomic.Uint64 type is available from Go 1.19 onwards. Ensure that the project specifies Go 1.19 or newer in its configuration to avoid compatibility issues.

Run the following script to check the Go version specified in the go.mod file:

✅ Verification successful

Go version compatibility verified for atomic.Uint64

The project uses Go 1.23.2, which fully supports atomic.Uint64 (introduced in Go 1.19). No compatibility issues exist.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check that the project requires Go 1.19 or newer.

# Look for the 'go' directive in the go.mod file
grep "^go " go.mod

Length of output: 30

Comment thread store/rootmulti/store.go Outdated
Comment thread baseapp/baseapp.go
Comment thread baseapp/baseapp.go
Comment thread baseapp/abci.go
Comment thread baseapp/abci.go
Comment thread baseapp/abci.go
Comment thread baseapp/abci_test.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
store/rootmulti/store.go (1)

514-522: Consider combining the mutex locks.

The code currently acquires and releases the mutex twice in quick succession. Consider combining these operations to reduce lock overhead:

-       rs.lastCommitInfoMut.Lock()
-       rs.lastCommitInfo = cInfo
-       rs.lastCommitInfoMut.Unlock()
-
-       rs.lastCommitInfoMut.Lock()
-       rs.lastCommitInfo.Timestamp = rs.commitHeader.Time
-       rs.lastCommitInfoMut.Unlock()
+       rs.lastCommitInfoMut.Lock()
+       rs.lastCommitInfo = cInfo
+       rs.lastCommitInfo.Timestamp = rs.commitHeader.Time
+       rs.lastCommitInfoMut.Unlock()
📜 Review details

Configuration used: .coderabbit.yml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 298d02e and aa5960a.

📒 Files selected for processing (1)
  • store/rootmulti/store.go (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
store/rootmulti/store.go (1)

Pattern **/*.go: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.

🔇 Additional comments (4)
store/rootmulti/store.go (4)

63-63: LGTM: Good choice of RWMutex for lastCommitInfo synchronization.

Using sync.RWMutex is appropriate here as it allows concurrent reads while ensuring exclusive writes, which aligns with the access patterns of lastCommitInfo.


292-294: LGTM: Appropriate lock scope in LoadVersion.

The mutex lock is correctly scoped to only protect the critical section (lastCommitInfo assignment), minimizing lock contention.


448-451: LGTM: Well-implemented thread-safe accessor.

The method correctly uses RLock/RUnlock for concurrent read access to lastCommitInfo. The defer statement ensures proper unlock even in case of panics.


440-446: LGTM: Consistent usage of thread-safe accessor.

All methods consistently use the thread-safe LastCommitInfo accessor, properly addressing the race condition between Commit and CreateQueryContext operations.

Also applies to: 456-474, 802-804

Comment thread store/rootmulti/store.go Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)
store/rootmulti/store.go (4)

63-63: Excellent use of atomic.Pointer for thread safety.

Using atomic.Pointer instead of mutex locks is a more efficient solution for this use case since the CommitInfo content itself is not modified concurrently, only the pointer is updated atomically.

This approach provides better performance than mutex locks as it:

  1. Avoids lock contention
  2. Allows for lock-free reads
  3. Provides atomic pointer updates

438-443: Consider caching the atomic load result.

Multiple accesses to lastCommitInfo.Load() could be optimized by storing the result in a local variable.

-lastCommitInfo := rs.lastCommitInfo.Load()
-if lastCommitInfo == nil {
-  return GetLatestVersion(rs.db)
-}
-
-return lastCommitInfo.Version
+if info := rs.lastCommitInfo.Load(); info == nil {
+  return GetLatestVersion(rs.db)
+} else {
+  return info.Version
+}

510-512: Optimize multiple atomic loads.

Multiple consecutive calls to lastCommitInfo.Load() should be consolidated to a single load.

-cInfo := rs.lastCommitInfo.Load()
-cInfo.Timestamp = rs.commitHeader.Time
-
+if cInfo := rs.lastCommitInfo.Load(); cInfo != nil {
+  cInfo.Timestamp = rs.commitHeader.Time
+}

791-793: Consolidate atomic load operation.

The atomic load result should be cached to avoid multiple loads and null checks.

-lastCommitInfo := rs.lastCommitInfo.Load()
-if res.Height == lastCommitInfo.Version {
-  commitInfo = lastCommitInfo
+if info := rs.lastCommitInfo.Load(); info != nil && res.Height == info.Version {
+  commitInfo = info
📜 Review details

Configuration used: .coderabbit.yml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f3b3d6e and dba4393.

📒 Files selected for processing (1)
  • store/rootmulti/store.go (9 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
store/rootmulti/store.go (1)

Pattern **/*.go: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.

🔇 Additional comments (1)
store/rootmulti/store.go (1)

12-12: LGTM: Required import for atomic operations.

The addition of "sync/atomic" import is necessary for the atomic pointer implementation.

Comment thread baseapp/baseapp.go
@beer-1 beer-1 requested review from aljo242 and yihuang December 27, 2024 05:28
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
baseapp/abci_test.go (3)

2812-2814: Consider adding verification for race-free operation.

The atomic counter is used to track the number of contexts created but isn't checked in the test assertions.

To strengthen the test, add an assertion to verify that the counter's value is non-zero, confirming that the query context creation actually happened concurrently with the commits. This ensures the test is properly exercising the race condition scenario.

	cancel()
+	wg.Wait()

+	counterValue := counter.Load()
+	require.Greater(t, counterValue, uint64(0), "Expected query contexts to be created")
	require.Equal(t, int64(1001), app.GetContextForCheckTx(nil).BlockHeight())

2815-2828: Use more realistic query simulation.

The CreateQueryContextWithCheckHeader call is made repeatedly without any delay, which may not represent real-world usage patterns.

Consider adding a small sleep between queries to make the test more representative of real-world conditions and reduce CPU usage:

	queryCreator := func() {
		defer wg.Done()
		for {
			select {
			case <-ctx.Done():
				return
			default:
				_, err := app.CreateQueryContextWithCheckHeader(0, false, false)
				require.NoError(t, err)

				counter.Add(1)
+				time.Sleep(time.Millisecond) // Add small delay between queries
			}
		}
	}

2834-2840: Consider using goroutines for block commits as well.

The test currently runs commits serially, which doesn't fully exercise concurrent access scenarios.

For a more thorough test of the race condition fix, consider having some goroutines run query creation and others run block commits, both concurrently. This would better simulate real-world conditions:

	for i := 0; i < 100; i++ {
		wg.Add(1)
		go queryCreator()
	}

-	for i := 0; i < 1000; i++ {
-		_, err = app.FinalizeBlock(&abci.FinalizeBlockRequest{Height: app.LastBlockHeight() + 1})
-		require.NoError(t, err)
-
-		_, err = app.Commit()
-		require.NoError(t, err)
-	}
+	// Create a mutex to protect LastBlockHeight() access
+	var heightMu sync.Mutex
+	
+	// Run 10 concurrent block committers
+	for i := 0; i < 10; i++ {
+		wg.Add(1)
+		go func() {
+			defer wg.Done()
+			for j := 0; j < 100; j++ {
+				heightMu.Lock()
+				height := app.LastBlockHeight() + 1
+				heightMu.Unlock()
+				
+				_, err := app.FinalizeBlock(&abci.FinalizeBlockRequest{Height: height})
+				require.NoError(t, err)
+				
+				_, err = app.Commit()
+				require.NoError(t, err)
+				
+				// Small delay to avoid CPU contention
+				time.Sleep(time.Millisecond)
+			}
+		}()
+	}

Note: This more aggressive testing approach would require additional synchronization mechanisms to ensure block height increments correctly. The method shown in the example uses a mutex to protect access to LastBlockHeight().

📜 Review details

Configuration used: .coderabbit.yml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f76bcfe and 4d5edef.

⛔ Files ignored due to path filters (2)
  • go.sum is excluded by !**/*.sum
  • store/go.sum is excluded by !**/*.sum
📒 Files selected for processing (5)
  • CHANGELOG.md (2 hunks)
  • baseapp/abci.go (26 hunks)
  • baseapp/abci_test.go (2 hunks)
  • go.mod (1 hunks)
  • store/go.mod (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • go.mod
🧰 Additional context used
📓 Path-based instructions (3)
`**/*.go`: Review the Golang code for conformity with the Ub...

**/*.go: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.

  • baseapp/abci_test.go
  • baseapp/abci.go
`**/*_test.go`: "Assess the unit test code assessing suffici...

**/*_test.go: "Assess the unit test code assessing sufficient code coverage for the changes associated in the pull request"

  • baseapp/abci_test.go
`**/*.md`: "Assess the documentation for misspellings, gramm...

**/*.md: "Assess the documentation for misspellings, grammatical errors, missing documentation and correctness"

  • CHANGELOG.md
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: test-system-v2
  • GitHub Check: Analyze
🔇 Additional comments (11)
store/go.mod (2)

15-15: Dependency Version Update for github.com/cosmos/iavl
The dependency version has been downgraded to v1.3.4 to align with the project’s compatibility requirements. Please verify that this downgrade integrates well with related modules and does not trigger any regression.


19-19: Dependency Version Update for github.com/hashicorp/go-plugin
The version for github.com/hashicorp/go-plugin is now v1.6.2. Confirm that this downgrade is consistent with the dependency alignment approach used across the project and that there are no breaking changes or security advisories associated with the new version.

baseapp/abci.go (7)

782-785: Ensure thread-safe access to finalizeState

The check for nil finalizeState and subsequent initialization should be protected with a mutex to ensure thread-safety during concurrent access.


872-874: Handle potential type assertion failure

The type assertion on line 873 may panic if SetTracingContext(nil) doesn't return a storetypes.CacheMultiStore. Consider adding a type check:

-finalizeState.ms = finalizeState.ms.SetTracingContext(nil).(storetypes.CacheMultiStore)
+cms, ok := finalizeState.ms.SetTracingContext(nil).(storetypes.CacheMultiStore)
+if !ok {
+    return nil, fmt.Errorf("expected CacheMultiStore, got %T", finalizeState.ms.SetTracingContext(nil))
+}
+finalizeState.ms = cms

1301-1303: Check for nil states in state slice

When adding states to the slice, ensure they're not nil to prevent panics during iteration. A nil check is implicitly performed later with if state != nil, but the slice itself could contain nil values.


72-73: LGTM: Properly getting finalize state via getState

Accessing the state through the new thread-safe getState method instead of directly accessing the member variable helps prevent race conditions.


442-462: Good use of state encapsulation

Replacing direct state access with getState in PrepareProposal is a good practice for better thread safety. The consistent use of a local variable prevents potential race conditions that could occur with direct field access.


943-944: Proper state cleanup on optimistic execution abort

Adding explicit state clearing when aborting optimistic execution ensures the system returns to a clean state before executing the block normally.


1032-1036: Consistent state management in Commit

The use of clearState and getState methods provides a more consistent and thread-safe approach to state management during the commit phase, which helps prevent race conditions between Commit and CreateQueryContext.

baseapp/abci_test.go (1)

2799-2845: Need to add synchronization to wait for goroutines to complete.

The test successfully simulates concurrent access to create race conditions between Commit and CreateQueryContext, but it doesn't properly wait for all goroutines to finish after cancellation.

After calling cancel(), the test does not wait for the queryCreator goroutines to exit. To ensure proper synchronization and prevent potential issues, consider using a sync.WaitGroup to wait for all goroutines to complete before the test finishes.

Apply the following diff to use sync.WaitGroup:

	counter.Store(0)

	ctx, cancel := context.WithCancel(context.Background())
+	var wg sync.WaitGroup
	queryCreator := func() {
+		defer wg.Done()
		for {
			select {
			case <-ctx.Done():
				return
			default:
				_, err := app.CreateQueryContextWithCheckHeader(0, false, false)
				require.NoError(t, err)

				counter.Add(1)
			}
		}
	}

	for i := 0; i < 100; i++ {
+		wg.Add(1)
		go queryCreator()
	}

	for i := 0; i < 1000; i++ {
		_, err = app.FinalizeBlock(&abci.FinalizeBlockRequest{Height: app.LastBlockHeight() + 1})
		require.NoError(t, err)

		_, err = app.Commit()
		require.NoError(t, err)
	}

	cancel()
+	wg.Wait()
CHANGELOG.md (1)

53-54: Well-structured improvement entry with clear purpose.

The entry correctly follows the formatting conventions of the changelog and provides a concise description of the fix. It properly identifies the affected component (baseapp), references the PR, and explains both the technical change (adding mutex locks and making lastCommitInfo atomic) and its purpose (preventing race conditions). This is important for threading safety and will help prevent subtle concurrency bugs.

Comment thread store/go.mod
github.com/cosmos/cosmos-proto v1.0.0-beta.5
github.com/cosmos/gogoproto v1.7.0
github.com/cosmos/iavl v1.3.5
github.com/cosmos/iavl v1.3.4
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the reasoning for this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, it will make other deps to bump the iavl to v1.3.5

@aljo242
Copy link
Copy Markdown
Contributor

aljo242 commented Mar 20, 2025

@beer-1 could you target this PR to release/v0.53.x?

@beer-1
Copy link
Copy Markdown
Contributor Author

beer-1 commented Mar 21, 2025

@beer-1 could you target this PR to release/v0.53.x?

done #24392

@technicallyty
Copy link
Copy Markdown
Member

closing as we can just utilize the PR against main and add a backport label to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: race condition between baseapp.Commit and baseapp.CreateQueryContext

8 participants