Skip to content

Skip ChannelManager persistence after monitor update completion#4424

Open
joostjager wants to merge 2 commits intolightningdevkit:mainfrom
joostjager:skip-cm-persist-monitor-completed
Open

Skip ChannelManager persistence after monitor update completion#4424
joostjager wants to merge 2 commits intolightningdevkit:mainfrom
joostjager:skip-cm-persist-monitor-completed

Conversation

@joostjager
Copy link
Copy Markdown
Contributor

@joostjager joostjager commented Feb 18, 2026

Since we are pulling ChannelManager persistence into the critical path, we should avoid persisting unnecessarily. When process_pending_monitor_events processes only Completed events, resulting state changes may be recoverable on restart and do not require persistence.

@ldk-reviews-bot
Copy link
Copy Markdown

ldk-reviews-bot commented Feb 18, 2026

👋 I see @TheBlueMatt was un-assigned.
If you'd like another reviewer assignment, please click here.

if result == NotifyOption::SkipPersistNoEvents {
result = NotifyOption::SkipPersistHandleEvents;
}
self.channel_monitor_updated(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to carefully review everything this calls to make sure they set the store-required flag. Might be easier to just manually set the flag if there's anything to do in handle_post_monitor_update_chan_resume?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "anything to do in handle_post_monitor_update_chan_resume", do you mean specifically a new monitor update being created? Or is there more that isn't recoverable on restart?

Added a commit that sets needs_persist_flag in handle_new_monitor_update_locked_actions_handled_by_caller right after chain_monitor.update_channel to cover that case.

Have to say that certainty level instantly drops when working on this state machine 😬

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an alternative approach on that is more conservative in stripping out persists: https://github.com/joostjager/rust-lightning/tree/skip-cm-persist-monitor-completed-alt

Rather than skipping persistence for all MonitorEvent::Completed events unconditionally, it inspects the result of monitor_updating_restored and only skips persistence when the effect is just queuing outbound messages (commitment_signed, revoke_and_ack). But also needs to be carefully reviewed to see if we don't miss anything.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than trying to carefully thread the needs-persistence flags through, maybe we just explicitly set needs_persist_flag in channel_monitor_updated? But yea I think your second approach there makes way more sense.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, pushed that to this PR after splitting in two commits for reviewability. Now it needs a careful eye to see if the persist condition is not missing anything.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, as of b0ec5eb I don't see any references to needs_persist_flag in the diff?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With 'pushed that', I meant the second, more conservative, approach for determining whether persistence is needed.

I thought threading through at least keeps the overview of where persistence is requested from exactly, and it also connects with process_pending_monitor_events needing to return a value for the persistence guard. It is a bit confusing that two mechanisms are both used.

I can return SkipPersistHandleEvents from process_pending_monitor_events for Completed events, but I think the needs_persist flag cannot be set in channel_monitor_updated. It needs to be determined two levels deeper in try_resume_channel_post_monitor_update. Is your suggestion to thread back the bool from try_resume_channel_post_monitor_update to channel_monitor_updated and set the flag there? Or set the flag in try_resume_channel_post_monitor_update and consequently also in several other branches of the call tree?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, my thinking was that we'd just self.needs_persist_flag.store() in try_resume_channel_post_monitor_update/wherever we decide we need to persist (after debug_assert'ing that we're holding the global_persistency_lock). Glancing at the other callsites this looks correct/redundant.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.97%. Comparing base (14e522f) to head (b0ec5eb).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4424      +/-   ##
==========================================
+ Coverage   85.94%   85.97%   +0.03%     
==========================================
  Files         159      159              
  Lines      104644   104659      +15     
  Branches   104644   104659      +15     
==========================================
+ Hits        89934    89979      +45     
+ Misses      12204    12176      -28     
+ Partials     2506     2504       -2     
Flag Coverage Δ
tests 85.97% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@joostjager joostjager self-assigned this Feb 19, 2026
@joostjager joostjager force-pushed the skip-cm-persist-monitor-completed branch 2 times, most recently from 7b1c046 to 48dce43 Compare February 25, 2026 13:18
Refactor process_pending_monitor_events to return a NotifyOption
instead of a bool, allowing callers to distinguish between
DoPersist, SkipPersistHandleEvents, and SkipPersistNoEvents.

Both call sites in process_events_body and
get_and_clear_pending_msg_events are updated accordingly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@joostjager joostjager force-pushed the skip-cm-persist-monitor-completed branch from 48dce43 to b0ec5eb Compare February 25, 2026 13:30
@joostjager joostjager marked this pull request as ready for review February 26, 2026 09:22
@joostjager joostjager requested a review from TheBlueMatt March 2, 2026 10:34
@TheBlueMatt TheBlueMatt removed their request for review March 3, 2026 22:18
@TheBlueMatt TheBlueMatt self-requested a review March 31, 2026 00:32
@ldk-reviews-bot
Copy link
Copy Markdown

🔔 1st Reminder

Hey @TheBlueMatt! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@TheBlueMatt TheBlueMatt removed their request for review April 3, 2026 18:04
@TheBlueMatt TheBlueMatt added this to the 0.3 milestone Apr 3, 2026
When `process_pending_monitor_events` processes only `Completed` events
and the resulting state changes are limited to queuing outbound messages
(e.g. `commitment_signed`, `revoke_and_ack`), persistence can be
skipped. These messages are re-derived on restart from
`in_flight_monitor_updates`.

Rather than threading a `needs_persist` return value back through the
call chain, set `needs_persist_flag` directly in
`try_resume_channel_post_monitor_update` and `channel_monitor_updated`
when persistence is actually required.

AI tools were used in preparing this commit.
@joostjager joostjager force-pushed the skip-cm-persist-monitor-completed branch from b0ec5eb to ebac69e Compare April 9, 2026 09:51
Comment on lines 10347 to +10362
let unbroadcasted_batch_funding_txid =
chan.context.unbroadcasted_batch_funding_txid(&chan.funding);

// Queuing outbound messages (commitment_update, raa) alone does
// not require ChannelManager persistence.
let needs_persist = has_state_changes
|| !updates.finalized_claimed_htlcs.is_empty()
|| !updates.failed_htlcs.is_empty()
|| !update_actions.is_empty()
|| unbroadcasted_batch_funding_txid.is_some()
|| !htlc_forwards.is_empty()
|| decode_update_add_htlcs.is_some();

if needs_persist {
self.needs_persist_flag.store(true, Ordering::Release);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The needs_persist check omits committed_outbound_htlc_sources (pruning optimization, replayable) and tx_signatures (resendable message). This is likely intentional, but consider adding a comment explaining their exclusion so future maintainers of MonitorRestoreUpdates know the invariant.

More importantly: handle_channel_resumption (called above) invokes emit_channel_pending_event! and emit_initial_channel_ready_event!, which can push events to self.pending_events and mutate channel flags — neither of which is captured in this needs_persist check. This appears safe today because:

  • emit_channel_pending_event! requires is_funding_broadcastable() && !channel_pending_event_emitted — in practice this aligns with funding_broadcastable.is_some()
  • emit_initial_channel_ready_event! requires is_usable() && !initial_channel_ready_event_emitted — in practice this aligns with channel_ready.is_some()

Both conditions imply has_state_changes = true. However, this coupling is implicit. If those conditions ever diverge, events could land in pending_events without triggering persistence. In process_events_body the !pending_events.is_empty() check provides a safety net, but in get_and_clear_pending_msg_events there is no such fallback.

Consider adding a debug_assert or explicit comment documenting this invariant.

Comment on lines 13187 to +13193
let _ = self.handle_error(err, counterparty_node_id);
}

has_pending_monitor_events
if needs_persist {
NotifyOption::DoPersist
} else {
NotifyOption::SkipPersistHandleEvents
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: the function now uses two distinct mechanisms for signaling persistence need:

  1. Return value (NotifyOption::DoPersist): for HTLCEvent, HolderForceClosed*, CommitmentTxConfirmed
  2. Direct flag (needs_persist_flag.store): for state changes discovered inside channel_monitor_updatedtry_resume_channel_post_monitor_update

The SkipPersistHandleEvents return here relies on the caller calling event_persist_notifier.notify() so the background processor wakes up and sees the directly-set flag. This works correctly today but is subtle — consider adding a comment at the return site explaining that persistence may have already been requested via needs_persist_flag and that the caller must still call notify() for SkipPersistHandleEvents.

@ldk-claude-review-bot
Copy link
Copy Markdown
Collaborator

Review Summary

PR #4424: Skip ChannelManager persistence after monitor update completion

Two inline comments posted:

  1. lightning/src/ln/channelmanager.rs:10347-10362 — The needs_persist check in the Unblocked path doesn't document why committed_outbound_htlc_sources and tx_signatures are excluded, and relies on an implicit invariant that emit_channel_pending_event!/emit_initial_channel_ready_event! (called in handle_channel_resumption above) only fire when has_state_changes is already true. This invariant is not enforced and the get_and_clear_pending_msg_events path lacks the !pending_events.is_empty() safety net that process_events_body has.

  2. lightning/src/ln/channelmanager.rs:13187-13193 — The function now uses dual persistence mechanisms (return value for non-Completed events, direct needs_persist_flag for Completed-triggered state changes). The SkipPersistHandleEvents return relies on the caller calling notify() so the background processor wakes up and sees the directly-set flag. This is correct but subtle and should be documented.

Cross-cutting concerns:

  • No tests added. This is a non-trivial change to persistence behavior that could affect data durability. Tests verifying that (a) persistence is skipped for message-only completions and (b) persistence IS triggered for completions with state changes (htlc_forwards, finalized_claims, etc.) would significantly increase confidence.
  • The overall logic appears correct after thorough analysis. The key safety argument is that RAA/commitment_update/tx_signatures are regenerated by monitor_updating_restored on restart from the persisted ChannelMonitor state, so skipping CM persistence for message-only completions is safe. The needs_persist_flag is set directly for all other cases that produce non-recoverable state changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants