Skip to content

Harden publishing#4555

Merged
marta-lokhova merged 3 commits intostellar:masterfrom
marta-lokhova:PublishFsync
Nov 23, 2024
Merged

Harden publishing#4555
marta-lokhova merged 3 commits intostellar:masterfrom
marta-lokhova:PublishFsync

Conversation

@marta-lokhova
Copy link
Copy Markdown
Contributor

@marta-lokhova marta-lokhova commented Nov 22, 2024

This PR contains two changes:

  • Properly fsync HAS files that act as a "publish queue". Note that the checkpoints themselves are using XDR streams with fsync turned on already, so this change impacts HAS files only.
  • Gracefully handle situations where nodes enable publish mid-checkpoint. Core will wait until the next checkpoint to begin publishing. Without this change, nodes would need to rebuild state before enabling publishing (although it's unclear how often partial archives occur in practice).

Additionally, I added a few more thing to aid debugging, such as printing checkpoint files with dump-xdr and an offline command to dump all checkpoints scheduled to publish.

graydon
graydon previously approved these changes Nov 22, 2024
graydon
graydon previously approved these changes Nov 22, 2024
graydon
graydon previously approved these changes Nov 22, 2024
SirTyson
SirTyson previously approved these changes Nov 22, 2024
Copy link
Copy Markdown
Contributor

@SirTyson SirTyson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after the MODE_DISABLES_FSYNC change.

@marta-lokhova marta-lokhova added this pull request to the merge queue Nov 22, 2024
Merged via the queue into stellar:master with commit 0bda87b Nov 23, 2024
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the publishing mechanism by ensuring proper fsync of checkpoint files and gracefully handling situations where publishing is enabled mid-checkpoint. The key changes include:

  • Switching HAS checkpoint files from JSON to binary format with proper fsync support
  • Adding logic to wait for the next checkpoint boundary when publishing is enabled mid-checkpoint
  • Adding debugging tools to inspect checkpoint files and the publish queue

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/util/XDRStream.h Added documentation about durable write requirements and updated comments for clarity
src/util/BufferedAsioCerealOutputArchive.h Added <cereal/types/string.hpp> include to support string serialization
src/main/dumpxdr.cpp Updated regex to support .dirty suffix for checkpoint files
src/main/CommandLine.cpp Added print-publish-queue command and necessary cereal vector include
src/history/HistoryManagerImpl.cpp Changed checkpoint file format from JSON to binary with fsync, updated file extensions from .json to .checkpoint, and added loadCheckpointHAS helper
src/history/CheckpointBuilder.h Added mPublishWasDisabled flag and changed ensureOpen return type to bool
src/history/CheckpointBuilder.cpp Implemented logic to detect and handle mid-checkpoint publishing enablement, with proper checkpoint boundary detection

IMPORTANT: some areas of core require durable writes that
are resistant to application and system crashes. If you need durable writes:
1. Use a stream implementation that supports fsync, e.g. OutputFileStream
2. Write to a temp file first. If you don't intent to persist temp files across
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "intent" should be "intend".

Suggested change
2. Write to a temp file first. If you don't intent to persist temp files across
2. Write to a temp file first. If you don't intend to persist temp files across

Copilot uses AI. Check for mistakes.
Comment on lines 121 to 122
// Immediately produce a final checkpoint JSON (suitable for confirmed
// ledgers)
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment still references "checkpoint JSON" but the file format has been changed to binary. The comment should be updated to reflect this change.

Copilot uses AI. Check for mistakes.
Copilot AI added a commit that referenced this pull request Jan 7, 2026
Co-authored-by: anupsdf <127880479+anupsdf@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants