CNTRLPLANE-3428: tls fix HyperShift guest-side CO wait and Custom profile handling#31194
Conversation
Fix two issues found in HyperShift CI: 1. waitForGuestOperatorsAfterTLSChange was iterating over all clusterOperatorNames (including etcd, kube-apiserver, etc.) which run on the management cluster. On HyperShift these COs may not stabilize on the guest side after a TLS change, causing a 25m timeout. Use a new guestSideClusterOperatorNames list that only includes guest-side COs (image-registry, openshift-samples). 2. getExpectedMinTLSVersionWithType crashed with "Unknown TLS profile type: Custom" because configv1.TLSProfiles only contains predefined profiles. If a Custom profile is active (e.g. from a failed cleanup), read minTLSVersion directly from the Custom spec instead.
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
Skipping CI for Draft Pull Request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughNarrow TLS test waits to HyperShift guest-side targets, skip management-cluster-only targets, poll ConfigMaps until inject-TLS and expected TLS appear, and derive Custom TLSProfile min version from Config.Spec.TLSSecurityProfile.Custom.MinTLSVersion. ChangesTLS Profile Testing Enhancements
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 10 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
9797b97 to
b1ac6cd
Compare
|
/test pull-ci-openshift-origin-main-tls-observed-config-hypershift |
|
/test pull-ci-openshift-origin-main-tls-observed-config |
|
/test tls-observed-config-hypershift |
On HyperShift, control-plane deployments like controller-manager and apiserver do not exist on the guest cluster. Add controlPlane field to deploymentRolloutTarget and filter them out in guestSideDeploymentRolloutTargets() to prevent 404 errors during post-TLS-change rollout verification.
|
/test tls-observed-config-hypershift |
|
@gangwgr: This pull request references CNTRLPLANE-3428 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Scheduling required tests: |
On HyperShift, guest-side operators and deployments stabilize quickly but ConfigMap TLS annotation injection can lag. Replace the one-shot check in verifyConfigMapsForTargets with polling (5s interval, 5m timeout) so the test waits for propagation rather than failing immediately.
e3c5d71 to
0a0131e
Compare
|
/test tls-observed-config-hypershift |
|
Scheduling required tests: |
Rename the controlPlane field across all target structs to managementClusterComponent to better convey that the flag indicates whether a component runs on the management cluster (and should be skipped when testing from the guest cluster context on HyperShift).
|
/retest-required |
|
/test tls-observed-config-hypershift |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
test/extended/tls/tls_observed_config.go (1)
245-246: 💤 Low valueUpdate skip messages for consistency with field rename.
The skip messages reference "control-plane target" but the field was renamed from
controlPlanetomanagementClusterComponent. For consistency, consider updating the skip messages.♻️ Suggested message updates
-g.Skip(fmt.Sprintf("Skipping control-plane target %s on HyperShift (runs on management cluster)", target.namespace)) +g.Skip(fmt.Sprintf("Skipping management cluster component %s on HyperShift", target.namespace))Apply similar changes to lines 257, 268, 279, and the disruptive test skip messages at lines 343, 350, 357, 364.
Also applies to: 256-257, 267-268, 278-279
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/tls/tls_observed_config.go` around lines 245 - 246, The skip text still says "control-plane target" but the field was renamed to managementClusterComponent, so update the g.Skip messages invoked where the condition uses isHyperShiftCluster && target.managementClusterComponent (and any similar g.Skip calls in the same file that check target.managementClusterComponent) to say something like "Skipping management-cluster target %s on HyperShift (runs on management cluster)" and apply the same wording change to the other occurrences and the disruptive test skip messages that reference the old "control-plane target" phrasing; ensure you update all g.Skip calls that mention the target.namespace and refer to control-plane to use management-cluster instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/extended/tls/tls_observed_config.go`:
- Around line 112-117: The ConfigMap target entries in tls_observed_config.go
for the operator namespaces (the literal slice entries with namespace values
"openshift-controller-manager", "openshift-kube-apiserver",
"openshift-apiserver", "openshift-kube-controller-manager", and
"openshift-kube-scheduler") are missing managementClusterComponent: true; update
each of those ConfigMap target structs to include managementClusterComponent:
true so they match the corresponding observedConfigTargets and serviceTargets
pattern (i.e., add managementClusterComponent: true to the entries with
configMapName "openshift-controller-manager-operator-config",
"kube-apiserver-operator-config", "openshift-apiserver-operator-config",
"kube-controller-manager-operator-config", and
"openshift-kube-scheduler-operator-config").
---
Nitpick comments:
In `@test/extended/tls/tls_observed_config.go`:
- Around line 245-246: The skip text still says "control-plane target" but the
field was renamed to managementClusterComponent, so update the g.Skip messages
invoked where the condition uses isHyperShiftCluster &&
target.managementClusterComponent (and any similar g.Skip calls in the same file
that check target.managementClusterComponent) to say something like "Skipping
management-cluster target %s on HyperShift (runs on management cluster)" and
apply the same wording change to the other occurrences and the disruptive test
skip messages that reference the old "control-plane target" phrasing; ensure you
update all g.Skip calls that mention the target.namespace and refer to
control-plane to use management-cluster instead.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: ed931bc3-96a3-470c-8e1a-13ea809bc0d8
📒 Files selected for processing (1)
test/extended/tls/tls_observed_config.go
Add managementClusterComponent: true to ConfigMap targets for controller-manager, kube-apiserver, openshift-apiserver, kube-controller-manager, and kube-scheduler to match the corresponding observedConfigTargets and serviceTargets. Also update g.Skip messages to use "management-cluster component" wording consistently.
|
/test tls-observed-config-hypershift |
| {namespace: "openshift-apiserver", operatorConfigGVR: schema.GroupVersionResource{Group: "operator.openshift.io", Version: "v1", Resource: "openshiftapiservers"}, operatorConfigName: "cluster", managementClusterComponent: true}, | ||
| {namespace: "openshift-etcd", operatorConfigGVR: schema.GroupVersionResource{Group: "operator.openshift.io", Version: "v1", Resource: "etcds"}, operatorConfigName: "cluster", managementClusterComponent: true}, | ||
| {namespace: "openshift-kube-controller-manager", operatorConfigGVR: schema.GroupVersionResource{Group: "operator.openshift.io", Version: "v1", Resource: "kubecontrollermanagers"}, operatorConfigName: "cluster", managementClusterComponent: true}, | ||
| {namespace: "openshift-kube-scheduler", operatorConfigGVR: schema.GroupVersionResource{Group: "operator.openshift.io", Version: "v1", Resource: "kubeschedulers"}, operatorConfigName: "cluster", managementClusterComponent: true}, |
There was a problem hiding this comment.
Aren't we checking the observed config for the openshift-cluster-samples-operator ?
| {namespace: "openshift-etcd", serviceName: "etcd", servicePort: "2379", managementClusterComponent: true}, | ||
| {namespace: "openshift-kube-controller-manager", serviceName: "kube-controller-manager", servicePort: "443", managementClusterComponent: true}, | ||
| {namespace: "openshift-kube-scheduler", serviceName: "scheduler", servicePort: "443", managementClusterComponent: true}, | ||
| {namespace: "openshift-cluster-samples-operator", serviceName: "metrics", servicePort: "60000", deploymentName: "cluster-samples-operator"}, |
There was a problem hiding this comment.
Can't we have these four groups (observedConfigTargets, configMapTargets, serviceTargets and deploymentRolloutTargets) in a single slice of Targets ? What do you think ?
There was a problem hiding this comment.
These were intentionally split from a single monolithic tlsTarget struct in the previous PR as per per review feedback. Each test category uses different fields — a single struct would carry many unused fields per entry, making it unclear which fields are relevant to each test. Happy to discuss further if you'd prefer a different approach.
| "image-registry", | ||
| "openshift-samples", | ||
| } | ||
|
|
There was a problem hiding this comment.
Why isn't this a function like, for example, guestSideObservedConfigTargets and guestSideConfigMapTargets ?
| g.It(fmt.Sprintf("should populate ObservedConfig with TLS settings - %s", target.namespace), func() { | ||
| if isHyperShiftCluster && target.controlPlane { | ||
| g.Skip(fmt.Sprintf("Skipping control-plane target %s on HyperShift (runs on management cluster)", target.namespace)) | ||
| if isHyperShiftCluster && target.managementClusterComponent { |
There was a problem hiding this comment.
Just for sanity check: isHyperShiftCluster is true if the scan is happening on a "guest cluster" ? Is that it ?
There was a problem hiding this comment.
Yes, exactly. isHyperShiftCluster is set in BeforeEach via exutil.IsHypershift() — it's true when the test is running against a guest cluster of a HyperShift-hosted control plane.
Replace the plain string slice clusterOperatorNames with a typed clusterOperatorTarget struct carrying managementClusterComponent, consistent with all other target types. Add guestSideClusterOperatorTargets() filter function matching the pattern used by guestSideObservedConfigTargets() et al. Also add a comment explaining why the samples operator is absent from observedConfigTargets (it uses samples.operator.openshift.io/v1 Config which has no spec.observedConfig).
|
/test tls-observed-config-hypershift |
After a TLS profile change on HyperShift, the ConfigMap injection updates quickly but the backing deployment may not have restarted its pods yet. Add waitForDeploymentRolloutAfterTLSChange which captures existing pod UIDs, then polls until old pods are gone and the deployment is fully ready. This ensures the running pods serve with the new TLS config before wire-level checks are performed. Fixes the "TLS 1.2 should be REJECTED but succeeded" failure for svc/metrics in openshift-cluster-samples-operator.
|
/test tls-observed-config-hypershift |
|
Scheduling required tests: |
|
/retest |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gangwgr, ricardomaraschini The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/verified by ci runs |
|
@gangwgr: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
|
@gangwgr: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
4956345
into
openshift:main
CNTRLPLANE-3428: tls fix HyperShift guest-side CO wait and Custom profile handling
Fixed three HyperShift-specific failures in the TLS observed-config e2e tests:
Unknown TLS profile type: Custom — After a config-change test sets a Custom profile and DeferCleanup doesn't fully restore, subsequent tests crash because getExpectedMinTLSVersionWithType tries to look up "Custom" in configv1.TLSProfiles, which only contains predefined profiles.
ClusterOperator etcd did not reach stable state after 25m0s — waitForGuestOperatorsAfterTLSChange was iterating over all clusterOperatorNames including control-plane COs (etcd, kube-apiserver, etc.) that are managed by the HCP and shouldn't be polled from the guest cluster.
deployments.apps "controller-manager" not found — guestSideDeploymentRolloutTargets() was returning all deployment rollout targets including control-plane ones (controller-manager, apiserver, cluster-version-operator) that don't exist on the guest cluster.
Summary by CodeRabbit