Conversation
There was a problem hiding this comment.
Pull request overview
Adds time-based snapshotting to CCF, controlled by snapshots.time_interval and snapshots.min_tx_count, to trigger snapshots after a time threshold while avoiding repeated snapshots when the service is idle.
Changes:
- Extends snapshot scheduling logic to include time-based eligibility, and emits signatures when needed to complete snapshot chunking.
- Introduces a replicated
snapshot_statusinternal table to record snapshot scheduling metadata. - Updates startup/config parsing, host config schema, docs, and test infra CLI/config plumbing for the new snapshot settings.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/infra/network.py | Forwards new snapshot CLI args into node configuration. |
| tests/infra/e2e_args.py | Adds CLI switches for --snapshot-min-tx-interval and --snapshot-time-interval. |
| tests/e2e_operations.py | Adds an e2e test function intended to exercise time-based snapshotting. |
| tests/config.jinja | Renders new snapshot fields into node startup JSON. |
| src/service/tables/snapshot_status.h | Adds new internal snapshot_status ServiceValue type/table name. |
| src/node/snapshotter.h | Implements time-based snapshot scheduling state + writes snapshot_status. |
| src/node/node_state.h | Installs a map hook to ingest replicated snapshot_status on backups; wires new snapshotter ctor args. |
| src/node/history.h | Emits signatures when a snapshot should be scheduled even without other committable txs. |
| src/kv/store.h | Adds should_schedule_snapshot() and changes chunking decision logic around snapshots. |
| src/kv/kv_types.h | Extends snapshotter interface with should_schedule_snapshot. |
| src/common/configuration.h | Adds JSON optional fields for new snapshot settings. |
| include/ccf/node/startup_config.h | Adds snapshots.min_tx_count and snapshots.time_interval to config struct. |
| doc/operations/ledger_snapshot.rst | Documents time-based snapshotting and warns about low min_tx_count. |
| doc/host_config_schema/cchost_config.json | Adds schema entries for min_tx_count and time_interval. |
| doc/audit/builtin_maps.rst | Documents the new snapshot_status builtin map. |
Comments suppressed due to low confidence (1)
src/kv/store.h:1082
should_create_ledger_chunk_unsafe()no longer considersStoreFlag::SNAPSHOT_AT_NEXT_SIGNATURE. This changes behaviour when a snapshot is requested via tx flag/store flag but no snapshotter is configured, and it breaks the existing expectation thatshould_create_ledger_chunk()returns true immediately after setting this flag (seesrc/kv/test/kv_test.cpp"Chunk when the snapshotter requires one"). Consider restoringr = flag_enabled_unsafe(StoreFlag::SNAPSHOT_AT_NEXT_SIGNATURE)(or equivalent) so chunking is still forced when the snapshot flag is set, independent of snapshotter presence.
bool r = false;
if (chunker)
{
r |= chunker->is_chunk_end_requested(version);
}
if (snapshotter)
{
r |= snapshotter->record_committable(version);
}
| ccf::kv::AbstractStore::StoreFlag::SNAPSHOT_AT_NEXT_SIGNATURE); | ||
|
|
||
| REQUIRE_FALSE(record_signature(history, snapshotter, snapshot_idx)); | ||
| REQUIRE(record_signature(history, snapshotter, snapshot_idx)); |
There was a problem hiding this comment.
I'm not quite sure why this was REQUIRE_FALSE originally.
The logic around this is that record_committable for a signature previously only returned true if the snapshot was due and not forced.
However now it returns true iff a snapshot was requested.
The argument is that there is no backlinks from the should_schedule_snapshot to the tx or store locks, so it only goes signature_lock -> snapshotter_lock
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| "min_tx_count": { | ||
| "type": "integer", | ||
| "default": 2, | ||
| "description": "Minimum number of transactions that must have elapsed since the last snapshot before a time-based snapshot can be triggered. Has no effect if time_interval is not set. If lower than 2 while time_interval is enabled, snapshots will be triggered indefinitely." |
There was a problem hiding this comment.
I don't think we should allow values that result in indefinite triggering behaviour.
There was a problem hiding this comment.
This is up for debate.
It is somewhat reasonable that you might want it to generate a new snapshot every day, no matter what happens on the ledger itself: for example if you have don't have good replication of the snapshots throughout the cluster (such that there is always one which is about a day old)
Adds time-based snapshotting.
This is configured via
snapshot.min_tx_countandsnapshot.time_interval.The idea is that if
time_intervalandmin_tx_countare exceeded we should emit a snapshot.time_intervalis reasonably self-evident, butmin_tx_countallows us to avoid emitting new snapshots when no user transactions occur.In an idle service, a snapshot requires two transactions.
One for snapshot evidence, and one for the signature for that evidence.
Hence whenever there are more than two transactions since the last signature, then there needs to be a new snapshot.
UPDATE
This seems to be largely happy and correct, but the TSAN run fails due to #7739 which is pre-existing.