Skip ChannelManager persistence after monitor update completion#4424
Skip ChannelManager persistence after monitor update completion#4424joostjager wants to merge 2 commits intolightningdevkit:mainfrom
Conversation
|
👋 I see @TheBlueMatt was un-assigned. |
| if result == NotifyOption::SkipPersistNoEvents { | ||
| result = NotifyOption::SkipPersistHandleEvents; | ||
| } | ||
| self.channel_monitor_updated( |
There was a problem hiding this comment.
We'll need to carefully review everything this calls to make sure they set the store-required flag. Might be easier to just manually set the flag if there's anything to do in handle_post_monitor_update_chan_resume?
There was a problem hiding this comment.
By "anything to do in handle_post_monitor_update_chan_resume", do you mean specifically a new monitor update being created? Or is there more that isn't recoverable on restart?
Added a commit that sets needs_persist_flag in handle_new_monitor_update_locked_actions_handled_by_caller right after chain_monitor.update_channel to cover that case.
Have to say that certainty level instantly drops when working on this state machine 😬
There was a problem hiding this comment.
I have an alternative approach on that is more conservative in stripping out persists: https://github.com/joostjager/rust-lightning/tree/skip-cm-persist-monitor-completed-alt
Rather than skipping persistence for all MonitorEvent::Completed events unconditionally, it inspects the result of monitor_updating_restored and only skips persistence when the effect is just queuing outbound messages (commitment_signed, revoke_and_ack). But also needs to be carefully reviewed to see if we don't miss anything.
There was a problem hiding this comment.
Rather than trying to carefully thread the needs-persistence flags through, maybe we just explicitly set needs_persist_flag in channel_monitor_updated? But yea I think your second approach there makes way more sense.
There was a problem hiding this comment.
Ok, pushed that to this PR after splitting in two commits for reviewability. Now it needs a careful eye to see if the persist condition is not missing anything.
There was a problem hiding this comment.
I'm confused, as of b0ec5eb I don't see any references to needs_persist_flag in the diff?
There was a problem hiding this comment.
With 'pushed that', I meant the second, more conservative, approach for determining whether persistence is needed.
I thought threading through at least keeps the overview of where persistence is requested from exactly, and it also connects with process_pending_monitor_events needing to return a value for the persistence guard. It is a bit confusing that two mechanisms are both used.
I can return SkipPersistHandleEvents from process_pending_monitor_events for Completed events, but I think the needs_persist flag cannot be set in channel_monitor_updated. It needs to be determined two levels deeper in try_resume_channel_post_monitor_update. Is your suggestion to thread back the bool from try_resume_channel_post_monitor_update to channel_monitor_updated and set the flag there? Or set the flag in try_resume_channel_post_monitor_update and consequently also in several other branches of the call tree?
There was a problem hiding this comment.
Yea, my thinking was that we'd just self.needs_persist_flag.store() in try_resume_channel_post_monitor_update/wherever we decide we need to persist (after debug_assert'ing that we're holding the global_persistency_lock). Glancing at the other callsites this looks correct/redundant.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4424 +/- ##
==========================================
+ Coverage 85.94% 85.97% +0.03%
==========================================
Files 159 159
Lines 104644 104659 +15
Branches 104644 104659 +15
==========================================
+ Hits 89934 89979 +45
+ Misses 12204 12176 -28
+ Partials 2506 2504 -2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
7b1c046 to
48dce43
Compare
Refactor process_pending_monitor_events to return a NotifyOption instead of a bool, allowing callers to distinguish between DoPersist, SkipPersistHandleEvents, and SkipPersistNoEvents. Both call sites in process_events_body and get_and_clear_pending_msg_events are updated accordingly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
48dce43 to
b0ec5eb
Compare
|
🔔 1st Reminder Hey @TheBlueMatt! This PR has been waiting for your review. |
When `process_pending_monitor_events` processes only `Completed` events and the resulting state changes are limited to queuing outbound messages (e.g. `commitment_signed`, `revoke_and_ack`), persistence can be skipped. These messages are re-derived on restart from `in_flight_monitor_updates`. Rather than threading a `needs_persist` return value back through the call chain, set `needs_persist_flag` directly in `try_resume_channel_post_monitor_update` and `channel_monitor_updated` when persistence is actually required. AI tools were used in preparing this commit.
b0ec5eb to
ebac69e
Compare
| let unbroadcasted_batch_funding_txid = | ||
| chan.context.unbroadcasted_batch_funding_txid(&chan.funding); | ||
|
|
||
| // Queuing outbound messages (commitment_update, raa) alone does | ||
| // not require ChannelManager persistence. | ||
| let needs_persist = has_state_changes | ||
| || !updates.finalized_claimed_htlcs.is_empty() | ||
| || !updates.failed_htlcs.is_empty() | ||
| || !update_actions.is_empty() | ||
| || unbroadcasted_batch_funding_txid.is_some() | ||
| || !htlc_forwards.is_empty() | ||
| || decode_update_add_htlcs.is_some(); | ||
|
|
||
| if needs_persist { | ||
| self.needs_persist_flag.store(true, Ordering::Release); | ||
| } |
There was a problem hiding this comment.
The needs_persist check omits committed_outbound_htlc_sources (pruning optimization, replayable) and tx_signatures (resendable message). This is likely intentional, but consider adding a comment explaining their exclusion so future maintainers of MonitorRestoreUpdates know the invariant.
More importantly: handle_channel_resumption (called above) invokes emit_channel_pending_event! and emit_initial_channel_ready_event!, which can push events to self.pending_events and mutate channel flags — neither of which is captured in this needs_persist check. This appears safe today because:
emit_channel_pending_event!requiresis_funding_broadcastable() && !channel_pending_event_emitted— in practice this aligns withfunding_broadcastable.is_some()emit_initial_channel_ready_event!requiresis_usable() && !initial_channel_ready_event_emitted— in practice this aligns withchannel_ready.is_some()
Both conditions imply has_state_changes = true. However, this coupling is implicit. If those conditions ever diverge, events could land in pending_events without triggering persistence. In process_events_body the !pending_events.is_empty() check provides a safety net, but in get_and_clear_pending_msg_events there is no such fallback.
Consider adding a debug_assert or explicit comment documenting this invariant.
| let _ = self.handle_error(err, counterparty_node_id); | ||
| } | ||
|
|
||
| has_pending_monitor_events | ||
| if needs_persist { | ||
| NotifyOption::DoPersist | ||
| } else { | ||
| NotifyOption::SkipPersistHandleEvents |
There was a problem hiding this comment.
Minor: the function now uses two distinct mechanisms for signaling persistence need:
- Return value (
NotifyOption::DoPersist): forHTLCEvent,HolderForceClosed*,CommitmentTxConfirmed - Direct flag (
needs_persist_flag.store): for state changes discovered insidechannel_monitor_updated→try_resume_channel_post_monitor_update
The SkipPersistHandleEvents return here relies on the caller calling event_persist_notifier.notify() so the background processor wakes up and sees the directly-set flag. This works correctly today but is subtle — consider adding a comment at the return site explaining that persistence may have already been requested via needs_persist_flag and that the caller must still call notify() for SkipPersistHandleEvents.
Review SummaryPR #4424: Skip ChannelManager persistence after monitor update completion Two inline comments posted:
Cross-cutting concerns:
|
Since we are pulling ChannelManager persistence into the critical path, we should avoid persisting unnecessarily. When process_pending_monitor_events processes only Completed events, resulting state changes may be recoverable on restart and do not require persistence.