OCPBUGS-85426: improve DCM tests to reduce flakiness in endpoints#31173
OCPBUGS-85426: improve DCM tests to reduce flakiness in endpoints#31173jcmoraisjr wants to merge 1 commit into
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@jcmoraisjr: This pull request references Jira Issue OCPBUGS-85426, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughThis PR improves router e2e diagnostics and stability: it adds an admin Route client and extended failure output, tabular Endpoints/EndpointSlice logging, deployment-state fetch/print helpers used during churn/scale operations, and retries Endpoints retrieval for detached services. ChangesRouter Test Diagnostics and Stability
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 9 | ❌ 3❌ Failed checks (3 warnings)
✅ Passed checks (9 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jcmoraisjr The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@jcmoraisjr: This pull request references Jira Issue OCPBUGS-85426, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Scheduling required tests: |
3dcf0fe to
f272628
Compare
|
Scheduling required tests: |
|
/jira-refresh |
|
cc @rhamini3 |
|
/jira refresh |
|
@melvinjoseph86: This pull request references Jira Issue OCPBUGS-85426, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
f272628 to
1a4ff75
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/extended/router/config_manager_ingress.go`:
- Around line 71-83: The list calls swallow errors; change each call to capture
the error (routes, err :=
routeClient.RouteV1().Routes(oc.Namespace()).List(...); endpoints, err :=
kubeClient.CoreV1().Endpoints(oc.Namespace()).List(...); epsList, err :=
kubeClient.DiscoveryV1().EndpointSlices(oc.Namespace()).List(...)) and if err !=
nil log the error with context (e.g. include namespace and operation) before
attempting to output items; update the branches that currently call
outputIngress, outputEndpoints, outputEndpointSlice to still run when results
exist but ensure the err is logged (using the test/logger available in the file,
e.g. framework.Logf/t.Logf or the existing logger) so failed specs include
actionable diagnostics.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 30a7ec20-63d2-4427-8031-a2cc9653b227
📒 Files selected for processing (2)
test/extended/router/config_manager_ingress.gotest/extended/router/stress.go
|
/test verify |
|
/assign @bentito |
|
Scheduling required tests: |
Addressing a race when patching endpoints resource. This update adds a polling to ensure the endpoints resource was created. Also, dump all endpoints and endpointSlice in the test namespace in case of a test failure, which helps to correlate the current state and what should be expected in the test scenario. Example of deployments state log: ``` deployment state: replicas=4 pods=route-scale-in-5b7b4f6b8c-9ncjg/Running/10.128.1.27 // route-scale-in-5b7b4f6b8c-ck45n/Running/10.128.1.28 // route-scale-in-5b7b4f6b8c-srffr/Running/10.128.1.29 // route-scale-in-5b7b4f6b8c-v9hm2/Running/10.128.1.26 ``` Example of Endpoints and EndpointSlice resources listed if the test fails: ``` Endpoints: NAME ADDRESSES NOT READY ADDRESSES PORTS route-scale-in 10.128.1.26,10.128.1.27,10.128.1.28,10.128.1.29 9376 route-scale-in-khbh5 10.128.1.26 9376 EndpointSlices: NAME SERVICE ADDRESSES NOT READY ADDRESSES PORTS route-scale-in-khbh5-8ncmh route-scale-in-khbh5 10.128.1.26 9376 route-scale-in-rrhzv route-scale-in 10.128.1.27,10.128.1.28,10.128.1.29,10.128.1.26 9376 ``` https://redhat.atlassian.net/browse/OCPBUGS-85426
1a4ff75 to
7ae081b
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (2)
test/extended/router/stress.go (2)
699-699: 💤 Low valueNit: simplify boolean comparison.
*ready == truecan be written as just*ready.go vet/linters typically flag this asS1002.Proposed change
- if ready := ep.Conditions.Ready; ready == nil || *ready == true { + if ready := ep.Conditions.Ready; ready == nil || *ready {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/router/stress.go` at line 699, The boolean comparison is overly verbose: in the condition that checks ep.Conditions.Ready, replace the explicit comparison "*ready == true" with the simpler dereference "*ready" so the if becomes "if ready := ep.Conditions.Ready; ready == nil || *ready { ... }"; update the conditional around ep.Conditions.Ready accordingly to satisfy linters like S1002.
667-679: 💤 Low valueOptional: hoist
resumeAddrsout of the subset loop and simplify boolean check.
resumeAddrsis redefined on everySubsetsiteration; it can be a package-level helper (or defined once above the loop). Also consider renaming tosummarizeAddrsfor clarity — "resume" reads as the Portuguese/Spanish cognate of "summarize" rather than the English meaning. Purely stylistic; no behavior impact.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/router/stress.go` around lines 667 - 679, The inline function resumeAddrs defined inside the Subsets iteration should be hoisted so it’s defined once (either as a package-level helper or above the loop) and renamed to summarizeAddrs for clarity; update calls inside the loop to use summarizeAddrs(addrs). While moving it, simplify the logic that picks the address by returning the first non-empty of addr.IP or addr.Hostname (instead of nested if/else) and keep the join behavior the same so the behavior of functions referencing Subsets remains unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@test/extended/router/stress.go`:
- Line 699: The boolean comparison is overly verbose: in the condition that
checks ep.Conditions.Ready, replace the explicit comparison "*ready == true"
with the simpler dereference "*ready" so the if becomes "if ready :=
ep.Conditions.Ready; ready == nil || *ready { ... }"; update the conditional
around ep.Conditions.Ready accordingly to satisfy linters like S1002.
- Around line 667-679: The inline function resumeAddrs defined inside the
Subsets iteration should be hoisted so it’s defined once (either as a
package-level helper or above the loop) and renamed to summarizeAddrs for
clarity; update calls inside the loop to use summarizeAddrs(addrs). While moving
it, simplify the logic that picks the address by returning the first non-empty
of addr.IP or addr.Hostname (instead of nested if/else) and keep the join
behavior the same so the behavior of functions referencing Subsets remains
unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 108561fd-89f8-4529-b2a1-2862fa2cac51
📒 Files selected for processing (2)
test/extended/router/config_manager_ingress.gotest/extended/router/stress.go
|
Scheduling required tests: |
|
/test e2e-gcp-ovn |
|
@jcmoraisjr: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Addressing a race when patching endpoints resource. This update adds a polling to ensure the endpoints resource was created. Also, dump all endpoints and endpointSlice in the test namespace in case of a test failure, which helps to correlate the current state and what should be expected in the test scenario.
Example of deployments state log:
Example of Endpoints and EndpointSlice resources listed if the test fails:
https://redhat.atlassian.net/browse/OCPBUGS-85426
Summary by CodeRabbit