Skip to content

align probe exec timeouts with k8s probe timeouts#3784

Merged
openshift-merge-bot[bot] merged 2 commits intooperator-framework:masterfrom
grokspawn:probe-tuning
Mar 4, 2026
Merged

align probe exec timeouts with k8s probe timeouts#3784
openshift-merge-bot[bot] merged 2 commits intooperator-framework:masterfrom
grokspawn:probe-tuning

Conversation

@grokspawn
Copy link
Copy Markdown
Contributor

Description of the change:

Motivation for the change:

Architectural changes:

Testing remarks:

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Bug fixes are accompanied by regression test(s)
  • e2e tests and flake fixes are accompanied evidence of flake testing, e.g. executing the test 100(0) times
  • tech debt/todo is accompanied by issue link(s) in comments in the surrounding code
  • Tests are comprehensible, e.g. Ginkgo DSL is being used appropriately
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive
  • Tests marked as [FLAKE] are truly flaky and have an issue
  • Code is properly formatted

Signed-off-by: grokspawn <jordan@nimblewidget.com>
Copy link
Copy Markdown
Member

@rashmigottipati rashmigottipati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you noticed any issues with probe timeouts that motivated adding -rpc-timeout=5s? Or was this mainly to align with Kubernetes defaults?

@grokspawn
Copy link
Copy Markdown
Contributor Author

grokspawn commented Mar 4, 2026

Have you noticed any issues with probe timeouts that motivated adding -rpc-timeout=5s? Or was this mainly to align with Kubernetes defaults?

Yeah, there are a lot of times in the logs/events where the catsrc pods would claim a startupProbe failure because the catalog size prevented its GRPC availability w/in 1s. The retries always recovered it, but the logs always include a statement about the 1s failure.

omc get events -n openshift-marketplace | grep "redhat-operators" | grep -i "probe|unhealthy" | head -20)
⎿  3h35m Warning Unhealthy pod/redhat-operators-2kjzx
Startup probe failed: timeout: failed to connect service ":50051" within 1s
3h25m Warning Unhealthy pod/redhat-operators-2kjzx

This is weird, because we don't configure any 1s thresholds, but this is because it's the default for grpc_health_probe.

Functionally this change has no real effect. It will just result in avoiding the initial failure log/event for catalogs of a certain threshold size or higher, and the confusing message.

rashmigottipati
rashmigottipati previously approved these changes Mar 4, 2026
Copy link
Copy Markdown
Member

@rashmigottipati rashmigottipati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grokspawn Thanks for the context. This is a clean improvement. Changes look good to me.

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Mar 4, 2026
Copy link
Copy Markdown
Member

@rashmigottipati rashmigottipati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix the failing unit test, you may have to update the reconciler_test.go to reflect the new probe args.

@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 4, 2026
rashmigottipati
rashmigottipati previously approved these changes Mar 4, 2026
Copy link
Copy Markdown
Member

@rashmigottipati rashmigottipati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Mar 4, 2026
Signed-off-by: grokspawn <jordan@nimblewidget.com>
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 4, 2026
@grokspawn
Copy link
Copy Markdown
Contributor Author

Thanks @rashmigottipati . Could I bother you for another lgtm? I had to fix the whitespace-only formatting on the test file. (Maybe wait until we see the sanity CI pass so you don't throw away the lgtm because for some reason I can't run make verify on my local.)

Copy link
Copy Markdown
Member

@rashmigottipati rashmigottipati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Mar 4, 2026
@tmshort
Copy link
Copy Markdown
Contributor

tmshort commented Mar 4, 2026

/approve

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Mar 4, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rashmigottipati, tmshort

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 4, 2026
@openshift-merge-bot openshift-merge-bot Bot merged commit 04fcb6e into operator-framework:master Mar 4, 2026
14 checks passed
@grokspawn grokspawn deleted the probe-tuning branch March 4, 2026 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants