Fix and register `error_message_on_failure_to_fetch_snapshot` test by Copilot · Pull Request #7733 · microsoft/CCF

Copilot · 2026-03-11T14:49:08Z

Rename run_error_message_on_failure_to_fetch_snapshot → test_error_message_on_failure_to_fetch_snapshot(network, args) and add to run_backup_snapshot_download (runtime BackupSnapshotFetch failure)
Remove standalone schema.py entry
Add join_target_interface_name param to _setup_node in network.py (defaults to FILE_SERVING_RPC_INTERFACE, no behaviour change for existing callers)
Add test_join_time_snapshot_fetch_failure(network, args) — join-time FetchSnapshot failure
Add call to test_join_time_snapshot_fetch_failure in run_backup_snapshot_download
Factor shared log-assertion logic into _assert_snapshot_fetch_failure_messages(node, timeout_s=0) helper, used by both tests

Original prompt

This section details on the original issue you should resolve

<issue_title>error_message_on_failure_to_fetch_snapshot test is not run, and currently fails</issue_title>
<issue_description>Test was added in #7317, but not actually called. I renamed it (with other renames) in #7709, and noticed it wasn't being called.</issue_description>

<agent_instructions>Add this test to a sensible runner, near similar tests. Run it to examine the failure, work out what's broken, and fix it. May just be some drift in the error message that's printed? May be something more sinister. Good luck!</agent_instructions>

Comments on the Issue (you are @copilot in this section)

Fixes error_message_on_failure_to_fetch_snapshot test is not run, and currently fails #7715

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

The test was added in #7317 but never added to a runner. The original logic was also broken: it stopped the primary before joining, so no StartupSeqnoIsOld response was ever produced and FetchSnapshot was never triggered - the expected retry/giving-up log messages could never appear. Redesign: use BackupSnapshotFetch with backup_snapshot_fetch_target_rpc_interface="primary_rpc_interface". The primary_rpc_interface lacks the SnapshotRead operator feature, so every fetch request returns HTTP 404, reliably driving the fetch_from_peer retry loop to log all three attempt messages and the "Exceeded maximum snapshot fetch retries … giving up" message. Also: - Add args.label suffix to avoid workspace collision - Inline txs construction (matches run_backup_snapshot_download style) - Use network.txs.issue instead of a separate txs variable - Collect matched patterns before removing to avoid list mutation during iteration - Register the test in schema.py alongside run_backup_snapshot_download Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>

eddyashton · 2026-03-11T15:56:58Z

@copilot Rather than adding a separate runner and network just for this single test, can we rename it to a test_ function, put it in an existing run, and simply create a new joiner pointed at the wrong interface to produce the desired errors?

Per reviewer feedback: rename run_error_message_on_failure_to_fetch_snapshot to test_error_message_on_failure_to_fetch_snapshot(network, args) and add it to the existing run_backup_snapshot_download run alongside test_backup_snapshot_fetch and test_backup_snapshot_fetch_max_size, rather than having a separate runner and network. Also remove the now-unnecessary standalone schema.py cr.add entry. Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>

Copilot · 2026-03-11T16:01:38Z

@copilot Rather than adding a separate runner and network just for this single test, can we rename it to a test_ function, put it in an existing run, and simply create a new joiner...

Done in 5004695. Renamed to test_error_message_on_failure_to_fetch_snapshot(network, args), added it to run_backup_snapshot_download alongside the other two snapshot-fetch tests, and dropped the separate runner and schema.py entry.

eddyashton · 2026-03-11T16:53:17Z

@copilot There's 2 kinds of "backup nodes fetching snapshots from primaries" - during join, and gossiped during runtime. Your changes have altered this test to find this error during the second phase (BackupSnapshotFetch). That's a useful test worth adding to, but we still do not have coverage confirming these errors can be shown when the initial join fetch fails.

… coverage Adds coverage for the initial join-time FetchSnapshot failure path, which the reviewer identified as missing. - Add join_target_interface_name parameter to _setup_node in network.py (defaults to FILE_SERVING_RPC_INTERFACE, preserving existing behaviour) so tests can direct the join target to a specific interface. - Add test_join_time_snapshot_fetch_failure(network, args): 1. Adds an intermediate node joined from snapshot (startup_seqno > 0). 2. Joins a fresh node (no snapshot) targeting that intermediate node via primary_rpc_interface (lacks SnapshotRead). The intermediate node returns StartupSeqnoIsOld (its startup_seqno > 0), triggering the FetchSnapshot task. 3. FetchSnapshot GETs /node/snapshot on primary_rpc_interface → HTTP 404 on every attempt, exhausting the 3-attempt retry loop and logging "Exceeded maximum snapshot fetch retries (3), giving up". 4. Asserts all expected retry and give-up log messages appear. - Add call in run_backup_snapshot_download before the existing test_error_message_on_failure_to_fetch_snapshot (BackupSnapshotFetch). Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>

Copilot · 2026-03-11T17:12:12Z

@copilot There's 2 kinds of "backup nodes fetching snapshots from primaries" - during join, and gossiped during runtime. Your changes have altered this test to find this error during...

Added in 21b84d8. test_join_time_snapshot_fetch_failure adds an intermediate node from snapshot (startup_seqno > 0), then joins a fresh node targeting that intermediate via primary_rpc_interface. The intermediate returns StartupSeqnoIsOld, triggering the join-time FetchSnapshot task, which exhausts its 3 attempts against the 404 endpoint (no SnapshotRead) and logs "Exceeded maximum snapshot fetch retries". _setup_node in network.py gets a new join_target_interface_name parameter (defaulting to FILE_SERVING_RPC_INTERFACE) to enable this.

eddyashton · 2026-03-12T09:26:45Z

@copilot The setup and shutdown code around these tests might be different, but I feel like the core assertion logic (scanning the logs for the expected messages) should be identical. Can you factor that out into a separate function, called by both tests?

…_messages Both test_join_time_snapshot_fetch_failure and test_error_message_on_failure_to_fetch_snapshot used identical expected_patterns lists and near-identical log-scanning loops. Factored into _assert_snapshot_fetch_failure_messages(node, timeout_s=0): - timeout_s=0 (default): single pass, for already-stopped nodes - positive timeout_s: polls until messages appear, for still-running nodes Co-authored-by: eddyashton <6000239+eddyashton@users.noreply.github.com>

Copilot · 2026-03-12T09:30:17Z

@copilot The setup and shutdown code around these tests might be different, but I feel like the core assertion logic (scanning the logs for the expected messages) should be identical...

Done in f57db9c. Factored into _assert_snapshot_fetch_failure_messages(node, timeout_s=0) — single pass (default timeout_s=0) for the already-stopped join-time node, and timeout_s=30 for the still-running runtime node.

Initial plan

44d1adb

Copilot AI assigned Copilot and eddyashton Mar 11, 2026

Copilot started work on behalf of eddyashton March 11, 2026 14:49 View session

Copilot AI changed the title ~~[WIP] Add error message test for snapshot fetching~~ Fix and register error_message_on_failure_to_fetch_snapshot test Mar 11, 2026

Copilot finished work on behalf of eddyashton March 11, 2026 15:38

Copilot started work on behalf of eddyashton March 11, 2026 15:57 View session

Copilot finished work on behalf of eddyashton March 11, 2026 16:02

Copilot started work on behalf of eddyashton March 11, 2026 16:53 View session

Copilot finished work on behalf of eddyashton March 11, 2026 17:13

Merge branch 'main' into copilot/add-error-message-test-run

ffc4086

Copilot started work on behalf of eddyashton March 12, 2026 09:26 View session

Copilot finished work on behalf of eddyashton March 12, 2026 09:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix and register `error_message_on_failure_to_fetch_snapshot` test#7733

Fix and register `error_message_on_failure_to_fetch_snapshot` test#7733
Copilot wants to merge 6 commits intomainfrom
copilot/add-error-message-test-run

Copilot AI commented Mar 11, 2026 •

edited

Loading

Uh oh!

eddyashton commented Mar 11, 2026

Uh oh!

Copilot AI commented Mar 11, 2026

Uh oh!

eddyashton commented Mar 11, 2026

Uh oh!

Copilot AI commented Mar 11, 2026

Uh oh!

eddyashton commented Mar 12, 2026

Uh oh!

Copilot AI commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comments on the Issue (you are @copilot in this section)

Uh oh!

eddyashton commented Mar 11, 2026

Uh oh!

Copilot AI commented Mar 11, 2026

Uh oh!

eddyashton commented Mar 11, 2026

Uh oh!

Copilot AI commented Mar 11, 2026

Uh oh!

eddyashton commented Mar 12, 2026

Uh oh!

Copilot AI commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 11, 2026 •

edited

Loading