Skip to content

Timings metrics are limited by percentile buckets #1132

@DenisRazinkin

Description

@DenisRazinkin

The hardcoded bucket limits in utils::statistics::Percentile are insufficient for tracking latency of long-running requests.

For example, with the common configuration for grpc :
userver/grpc/include/userver/ugrpc/impl/statistics.hpp
using Percentile = utils::statistics::Percentile<2000, std::uint32_t, 256, 100>;

The maximum trackable value is 2000 + 256 * 100 = 27,600 ms (~27.5 seconds).

Any request exceeding this limit gets clamped to the last bucket. This means:

A 5-minute request shows as ~27.5 seconds
All percentiles (p50, p95, p99, etc.) cap at this value
We lose visibility into actual latency for problematic slow requests

As the simplest solution I'd sugeest to add sum tracking to Percentile class to enable average latency calculation:

rate(sum_latency) / rate(count)
This would provide at least the average latency even when individual values exceed bucket limits.

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions