Skip to content

eclient,log_test,userdata: ssh robustness improvements#1143

Merged
eriknordmark merged 2 commits intolf-edge:masterfrom
eriknordmark:erik-networks
Apr 20, 2026
Merged

eclient,log_test,userdata: ssh robustness improvements#1143
eriknordmark merged 2 commits intolf-edge:masterfrom
eriknordmark:erik-networks

Conversation

@eriknordmark
Copy link
Copy Markdown
Contributor

@eriknordmark eriknordmark commented Apr 17, 2026

metadata test: SSH retry robustness

Replace the iteration-bounded seq/sleep loop in ssh.sh with a
time-bounded while loop (18 min, explicit exit 1 on expiry) to reliably
fill the exec -t 20m budget.


eclient,log_test,userdata: ssh robustness improvements

Increase SSH timeouts, add failure indication and sshd readiness check

Replace iteration-bounded seq/sleep SSH retry loops in eclient.txt and
userdata.txt with time-bounded while loops, which exit 1 if it never
manages to connect.

In log_test.txt, add a foreground wait_ssh.sh step (5m timeout) that
polls eden eve ssh until EVE's sshd accepts a connection before the
background ssh.sh starts generating log entries.

@eriknordmark eriknordmark marked this pull request as draft April 17, 2026 12:45
@eriknordmark eriknordmark marked this pull request as ready for review April 17, 2026 16:31
@eriknordmark eriknordmark requested review from europaul and rene April 17, 2026 16:31
eriknordmark and others added 2 commits April 20, 2026 09:51
…order

Replace the iteration-bounded seq/sleep loop in ssh.sh with a
time-bounded while loop (18 min, explicit exit 1 on expiry) to reliably
fill the exec -t 20m budget.

Restore TestInfo as a background process started before ssh.sh. Eden's
InfoChecker uses InfoNew mode (new messages only), so TestInfo must be
subscribed before the curl POST fires to capture the immediate ZInfoMsg
EVE sends on AppInstMetadata receipt. Running TestInfo after ssh.sh meant
the subscription was registered after that message had already arrived at
Adam; the next info comes from the periodic timer (10m ± 20% jitter),
well outside any reasonable timewait. Raise the TestInfo timewait to 20m
to cover the full 18m ssh.sh window plus the periodic timer as a fallback.

Signed-off-by: eriknordmark <erik@zededa.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sshd readiness check

Replace iteration-bounded seq/sleep SSH retry loops in eclient.txt and
userdata.txt with time-bounded while loops. The old loops ran for at most
seq*sleep seconds and fell through silently; the new loops run until 30s
before the exec -t deadline and exit 1 explicitly when no connection was
made, making failures visible rather than letting the test continue with
an ambiguous result.

In log_test.txt, add a foreground wait_ssh.sh step (5m timeout) that
polls `eden eve ssh` until EVE's sshd accepts a connection before the
background ssh.sh starts generating log entries. log_test runs immediately
after eve_restart in the smoke suite; the restart confirms EVE is up via
Adam registration, but sshd takes additional time to become operational,
causing all SSH attempts to fail with connection timeouts/resets and
TestLog to time out waiting for "Disconnected" entries.

Signed-off-by: eriknordmark <erik@zededa.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@eriknordmark eriknordmark changed the title tests/eclient: make SSH retry loops time-bounded and fix metadata race tests/eclient,lim: fix flaky SSH-based tests Apr 20, 2026
@eriknordmark eriknordmark changed the title tests/eclient,lim: fix flaky SSH-based tests eclient,log_test,userdata: ssh robustness improvements Apr 20, 2026
Copy link
Copy Markdown
Contributor

@rene rene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eriknordmark eriknordmark merged commit d976210 into lf-edge:master Apr 20, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants