Skip to content

Export low-cardinality usage metrics for Explain Error #131

@shenxianpeng

Description

@shenxianpeng

Problem

Administrators need observability into Explain Error adoption and cost-related behavior, but there are currently no aggregated metrics.

Goal

Expose low-cardinality usage metrics for monitoring systems.

Scope

Export aggregated counters/histograms for:

  • total requests by entry point and result
  • provider calls by provider/model/result
  • request duration
  • approximate input size
  • quota rejections (for future compatibility)

Examples:

  • explain_error_requests_total{entrypoint,result}
  • explain_error_provider_calls_total{provider,model,result}
  • explain_error_request_duration_ms
  • explain_error_input_log_lines

Non-Goals

  • Exporting raw logs, prompts, explanations, or job/build identifiers
  • Per-job metrics labels
  • Audit UI
  • Quota enforcement

Important Constraints

Metrics must not use high-cardinality labels such as:

  • full job name
  • build number
  • build URL
  • username

Proposed Implementation

  • Add a metrics-backed UsageRecorder implementation.
  • Wire it to the internal usage events introduced previously.
  • Reuse Jenkins-friendly metrics mechanisms where possible instead of inventing a custom one-off format.
  • Keep metrics export optional or safely inert when no exporter/integration is configured.

Acceptance Criteria

  • Metrics reflect requests from both entry points.
  • Cache hits are counted separately from real provider calls.
  • Provider/model dimensions are available.
  • Metrics contain no high-cardinality labels.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions