Skip to content

fix: set network plugin to "none" in Azure CNI test cases#7463

Open
johananl wants to merge 1 commit into
mainfrom
johananl/fix-azure-cni-tests
Open

fix: set network plugin to "none" in Azure CNI test cases#7463
johananl wants to merge 1 commit into
mainfrom
johananl/fix-azure-cni-tests

Conversation

@johananl
Copy link
Copy Markdown
Member

@johananl johananl commented Nov 28, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

The Azure CNI plugin is installed nowadays by the azure-cns DaemonSet. When the user specifies --network-plugin=azure, RP ensures azure-cns installs the plugin and passes none to AB to avoid a duplicate installation process.

However, the e2e test cases aren't up to date and still specify azure as the plugin type in the VMSS config. This causes two installation processes to compete, thus leading to inconsistent test results.

Which issue(s) this PR fixes:

Fixes #7460

Requirements:

  • uses conventional commit messages
  • includes documentation
  • adds unit tests
  • tested upgrade from previous version
  • commits are GPG signed and Github marks them as verified

Special notes for your reviewer:

Release note:

none

The Azure CNI plugin is installed nowadays by the azure-cns DaemonSet.
When the user specifies --network-plugin=azure, RP ensures azure-cns
installs the plugin and passes "none" to AB to avoid a duplicate
installation process.

However, the e2e test cases aren't up to date and still specify "azure"
as the plugin type in the VMSS config. This causes two installation
process to compete, thus leading to inconsistent test results.

Signed-off-by: Johanan Liebermann <jliebermann@microsoft.com>
@johananl johananl force-pushed the johananl/fix-azure-cni-tests branch from a38ba4d to a569c42 Compare November 28, 2025 13:02
@johananl johananl changed the title Set network plugin to "none" in Azure CNI test cases fix: set network plugin to "none" in Azure CNI test cases Nov 28, 2025
@johananl
Copy link
Copy Markdown
Member Author

johananl commented Dec 1, 2025

If this PR is accepted (meaning it's OK to set the plugin to none in all test cases), we should consider whether we need any code for CNI installation in AB. If not, we should probably remove this part of the code.

@johananl johananl marked this pull request as ready for review December 2, 2025 11:06
r2k1 added a commit that referenced this pull request May 8, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 8, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 8, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 8, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 8, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 8, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 8, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 8, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 8, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 8, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 8, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 8, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 10, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 10, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 10, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 10, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 10, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 10, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 10, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 10, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 10, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 10, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 10, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 10, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 10, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 10, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 10, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r2k1 added a commit that referenced this pull request May 11, 2026
Revert a1bebdc (feat(e2e): add HTTP_PROXY + private DNS test scenario)
which had issues on the e2e-flakiness-fixes branch.

Analysis of 55 E2E builds on main (3 weeks) showed 84% failure rate.
Root causes identified and fixed:

1. Node readiness race (kube.go): WaitUntilNodeReady() returned success
   on NodeReady=True even when node still had the cloud-provider
   uninitialized taint, preventing test pod scheduling. Now waits for
   taint removal before declaring node ready.

2. IPtables false positives (validation.go): iptables eBPF-host-routing
   validator rejected a normal host DHCP INPUT rule (UDP/68) not in its
   allowlist. Added to allowlist.

3. CSE timing threshold (scenario_cse_perf_test.go): installDeps 90s
   threshold was set with 'no direct prod data' and consistently
   exceeded by the network-heavy apt workflow. Raised to 120s.

4. Duplicate CSE events (cse_timing.go): events appearing in both GA
   events directory and handler subdirectories created spurious
   Task_installDeps#01 subtests. Added deduplication.

5. Broken Ubuntu2004FIPS lane (scenario_test.go): Test added on
   2026-04-22 without VMSS FIPS capability setup, never green. Skipped
   until properly fixed.

Dropped from earlier version: Flatcar AzureCNI networkPlugin removal.
Rubber duck review found removing networkPlugin=azure defaults to
kubenet (not none), which would break tests differently. Proper fix
requires PR #7463 (set to none instead).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Competing Azure CNI installation modes in e2e tests

1 participant