-
Notifications
You must be signed in to change notification settings - Fork 246
feat: add background P2P init retries to SyncService #3002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @julienrbrt, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the robustness of the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
|
Claude finished @julienrbrt's task in 2m 17s —— View job Code Review CompleteI've performed a comprehensive review of the P2P initialization retry changes. Overall, the implementation is solid and addresses the timeout issue effectively. However, I've identified several important concerns across code quality, concurrency safety, testing, and error handling. Critical Issues1. Race Condition in Background Goroutine Lifecycle 🔴Location: The background goroutine launched in
Recommendation: Add a type SyncService[H header.Header[H]] struct {
// ... existing fields ...
bgCtx context.Context
bgCancel context.CancelFunc
bgWg sync.WaitGroup // Add this
}
// In initFromP2PWithRetry:
case <-timeoutTimer.C:
syncService.logger.Warn().Err(err).Msg("timeout reached while trying to initialize the store, scheduling background retry")
syncService.bgWg.Add(1)
go func() {
defer syncService.bgWg.Done()
syncService.retryInitInBackground()
}()
return nil
// In Stop():
func (syncService *SyncService[H]) Stop(ctx context.Context) error {
// cancel background operations
syncService.bgCancel()
// Wait for background goroutine to finish
syncService.bgWg.Wait()
// ... rest of stop logic ...
}2. Reduced Timeout Without Justification 🟡Location: The initial timeout was reduced from 10 minutes to 2 minutes (in the latest fix commit). While shorter timeouts are generally good, this is a significant reduction (80%) that could cause issues in slow networks or during network partitions. Questions:
Recommendation: Either:
High Priority Issues3. Insufficient Test Coverage (8.16%) 🟡Location: Overall change Codecov reports only 8.16% coverage for this change (45 lines untested). The new functionality includes:
Missing test scenarios:
Recommendation: Add integration tests covering: func TestSyncService_BackgroundRetry_Success(t *testing.T) {
// Test that background retry eventually succeeds
}
func TestSyncService_BackgroundRetry_Cancellation(t *testing.T) {
// Test that Stop() properly cancels background retry
}
func TestSyncService_BackgroundRetry_ExponentialBackoff(t *testing.T) {
// Test that backoff increases correctly
}4. Potential Double Initialization Race 🟡Location: The if syncService.storeInitialized.CompareAndSwap(false, true) {
if _, err := syncService.initStore(ctx, trusted); err != nil {
syncService.storeInitialized.Store(false) // Reset on error
return false, fmt.Errorf("failed to initialize the store: %w", err)
}
}If Potential issue: Two goroutines could race between lines 341-346, both thinking they won the race, both trying to append to the store. Recommendation: Add a mutex to protect the entire initialization sequence, or ensure 5. Error Context Lost in Background Retry 🟡Location: When the initial sync times out and switches to background mode, the last error is logged but not tracked. If background retry keeps failing, operators have no visibility into:
Recommendation: Add metrics or structured logging: func (syncService *SyncService[H]) retryInitInBackground() {
backoff := 15 * time.Second
maxBackoff := 5 * time.Minute
attemptCount := 0
for {
// ... select statement ...
attemptCount++
ok, err := syncService.tryInit(syncService.bgCtx)
if ok {
syncService.logger.Info().
Int("attempts", attemptCount).
Msg("successfully initialized store from P2P in background")
return
}
syncService.logger.Info().
Err(err).
Int("attempt", attemptCount).
Dur("retry_in", backoff).
Msg("background retry: headers not yet available from peers")
// ... backoff logic ...
}
}Medium Priority Issues6. Inconsistent Backoff Configuration 🟢Location: Two different backoff configurations exist:
Questions:
Recommendation: Document the reasoning or consider smoother transition (e.g., background could start at 10s to continue where initial left off). 7. Context Confusion: Service vs Background 🟢Location:
Consideration: Is this intentional? Should background retries respect the original context's deadline, or should they truly run independently? Current behavior: Background retries run until either:
This seems reasonable for resilience, but should be documented. 8. Order of Select Cases in Background Retry 🟢Location: The latest fix commit moved the context cancellation check to happen before the backoff sleep. This is correct and prevents an unnecessary sleep before shutdown, but it's worth noting: Before (incorrect): // tryInit happens first
// Then select with cancellation/sleepAfter (correct): // Check cancellation first
// Then tryInitThis is good! But consider adding a comment explaining why the order matters. Minor Issues9. Magic Numbers 🟢Location: Multiple locations Several hardcoded timeouts:
Recommendation: Extract to named constants for better maintainability: const (
initialSyncTimeout = 2 * time.Minute
backgroundRetryInitialBackoff = 15 * time.Second
backgroundRetryMaxBackoff = 5 * time.Minute
)10. Documentation Completeness 🟢Location: The new fields are documented: // context for background operations
bgCtx context.Context
bgCancel context.CancelFuncBut consider expanding to explain:
Positive Observations ✅
Architecture & Design Questions11. Should Background Retry Be Observable? 🤔Currently, there's no way to:
Consider: Should type SyncStatus struct {
StoreInitialized bool
SyncerStarted bool
BackgroundRetryActive bool
LastRetryError error
RetryAttempts int
}This would help with monitoring and debugging. 12. What Happens to Incoming Headers During Background Retry? 🤔If the store isn't initialized and background retry is running, what happens when Looking at lines 148-155, it seems like it would:
This is probably fine, but worth documenting that background retry is a "best effort" and the store can also be initialized via incoming writes. Security ConsiderationsNo security issues identified. The changes don't:
Performance Considerations13. Reduced Initial Timeout Might Increase Load 🟡Reducing timeout from 10m to 2m means:
Impact: Likely negligible, but worth monitoring P2P network load patterns after deployment. Summary & RecommendationsMust Fix Before Merge 🔴
Should Fix Before Merge 🟡
Nice to Have 🟢
Questions for Author
Test Plan SuggestionTo properly validate this change, consider testing:
Overall, this is a valuable improvement that prevents the service from blocking indefinitely. The core logic is sound, but the implementation needs better synchronization and test coverage to be production-ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a background retry mechanism for P2P initialization in SyncService. When the initial synchronous initialization times out, it now schedules a background task to continue retrying. This is a good improvement for startup performance as it avoids blocking indefinitely.
The changes are well-structured, including the addition of a background context for lifecycle management and the extraction of initialization logic into a reusable tryInit method. However, I've found a critical issue in the new retryInitInBackground function. It appears to be incomplete, missing the exponential backoff logic and the closing brace for the for loop, which would cause a compilation error. Please see my detailed comment for a suggested fix.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3002 +/- ##
==========================================
- Coverage 59.26% 59.24% -0.03%
==========================================
Files 108 108
Lines 10103 10126 +23
==========================================
+ Hits 5988 5999 +11
- Misses 3484 3496 +12
Partials 631 631
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Closes: #2994