Skip to content

Retries for fetching snapshots#7317

Merged
achamayou merged 11 commits intomicrosoft:mainfrom
cjen1-msft:fetch-retries
Sep 29, 2025
Merged

Retries for fetching snapshots#7317
achamayou merged 11 commits intomicrosoft:mainfrom
cjen1-msft:fetch-retries

Conversation

@cjen1-msft
Copy link
Contributor

@cjen1-msft cjen1-msft commented Sep 29, 2025

Add retry path when fetching snapshots controlled with command.join.fetch_snapshot_max_retries and command.join.fetch_snapshot_retry_interval in the config.

(The current fetch prints an error message and returns a std::nullopt when it fails, so this PR wraps that response in a retry loop)

@cjen1-msft cjen1-msft requested a review from a team as a code owner September 29, 2025 12:37
@cjen1-msft cjen1-msft marked this pull request as draft September 29, 2025 12:40
@cjen1-msft cjen1-msft marked this pull request as ready for review September 29, 2025 14:31
@cjen1-msft cjen1-msft added 6.x-todo PRs which should be backported to 6.x auto-backport Automatically backport this PR to LTS branch labels Sep 29, 2025
@achamayou achamayou enabled auto-merge September 29, 2025 17:06
@achamayou achamayou added this pull request to the merge queue Sep 29, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 29, 2025
@cjen1-msft cjen1-msft added this pull request to the merge queue Sep 29, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 29, 2025
@achamayou achamayou added this pull request to the merge queue Sep 29, 2025
Merged via the queue into microsoft:main with commit 9802ee3 Sep 29, 2025
21 checks passed
@achamayou achamayou deleted the fetch-retries branch September 29, 2025 21:13
cjen1-msft added a commit to cjen1-msft/CCF that referenced this pull request Sep 30, 2025
Co-authored-by: Amaury Chamayou <amaury@xargs.fr>
cjen1-msft added a commit that referenced this pull request Sep 30, 2025
)

Co-authored-by: Amaury Chamayou <amaury@xargs.fr>
@cjen1-msft cjen1-msft added the backported This PR was successfully backported to LTS branch label Sep 30, 2025
Copilot AI added a commit that referenced this pull request Mar 11, 2026
The test was added in #7317 but never added to a runner. The original
logic was also broken: it stopped the primary before joining, so no
StartupSeqnoIsOld response was ever produced and FetchSnapshot was never
triggered - the expected retry/giving-up log messages could never appear.

Redesign: use BackupSnapshotFetch with
backup_snapshot_fetch_target_rpc_interface="primary_rpc_interface".
The primary_rpc_interface lacks the SnapshotRead operator feature, so
every fetch request returns HTTP 404, reliably driving the
fetch_from_peer retry loop to log all three attempt messages and the
"Exceeded maximum snapshot fetch retries … giving up" message.

Also:
- Add args.label suffix to avoid workspace collision
- Inline txs construction (matches run_backup_snapshot_download style)
- Use network.txs.issue instead of a separate txs variable
- Collect matched patterns before removing to avoid list mutation during
  iteration
- Register the test in schema.py alongside run_backup_snapshot_download

Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.x-todo PRs which should be backported to 6.x auto-backport Automatically backport this PR to LTS branch backported This PR was successfully backported to LTS branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants