Skip to content

[codex] Add metrics for proxy-generated responses#624

Merged
winhowes merged 1 commit intomainfrom
codex/add-internal-proxy-response-metrics
Apr 14, 2026
Merged

[codex] Add metrics for proxy-generated responses#624
winhowes merged 1 commit intomainfrom
codex/add-internal-proxy-response-metrics

Conversation

@winhowes
Copy link
Copy Markdown
Owner

Motivation

  • The metrics surface currently captures upstream responses, request latency, rate-limit events, and incoming auth failures, but several proxy-generated failures are either invisible or only partially covered.
  • In particular, outgoing auth failures were not counted in authtranslator_auth_failures_total, and locally generated 4xx/5xx responses such as integration lookup failures, invalid destinations, rate-limit rejections, and missing proxy configuration were not exposed as a bounded Prometheus counter.
  • This PR intentionally skips adding an inbound-auth-failure-by-plugin breakdown for now; per the current integration shape, the integration label is usually sufficient.

Description

  • Add a new builtin counter authtranslator_internal_responses_total{integration,code,reason} for proxy-generated non-upstream responses, using coarse bounded reason labels.
  • Increment that counter across the existing local rejection/error paths in proxyHandler, including integration not found, incoming auth failure, caller/integration rate limiting, denylist/allowlist rejections, invalid destination, outgoing auth failure, and no proxy configured.
  • Count outgoing auth failures in the existing authtranslator_auth_failures_total metric so auth failures are represented consistently regardless of direction.
  • Extend the existing metrics and proxy tests to assert the new exports and metric deltas, and document the new counter in docs/observability.md.

Impact

  • Operators can now distinguish proxy-generated failures from upstream failures directly in Prometheus/Grafana.
  • Outgoing auth failures are no longer missing from the builtin auth failure metric.
  • The new reason label stays low-cardinality by using fixed categories rather than raw error text.

Testing

  • go test ./app/metrics -count=1
  • go test ./app -run 'TestProxyHandler(RateLimiterUsesIP|RetryAfterOutLimit|NotFound|AuthFailure|BadGateway|WildcardMissingDestination|OutgoingAuthError)$' -count=1
  • go test ./app -run 'TestProxyHandler(Denylist|NotFound|AuthFailure|BadGateway|WildcardMissingDestination|OutgoingAuthError)$|TestConstraintFailureReasonHeader|TestAllowlist|TestProxyHandlerRateLimiterUsesIP|TestProxyHandlerRetryAfterOutLimit' -count=1
  • go test ./... -count=1 currently fails in this environment due the pre-existing unrelated TestRateLimiterRedisTLSAuthRequiresVerification certificate-key-size error in app/redis_tls_auth_test.go.

@winhowes winhowes marked this pull request as ready for review April 14, 2026 18:35
@winhowes
Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Bravo.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@winhowes winhowes merged commit e3effbb into main Apr 14, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant