-
Notifications
You must be signed in to change notification settings - Fork 376
Description
The hardcoded bucket limits in utils::statistics::Percentile are insufficient for tracking latency of long-running requests.
For example, with the common configuration for grpc :
userver/grpc/include/userver/ugrpc/impl/statistics.hpp
using Percentile = utils::statistics::Percentile<2000, std::uint32_t, 256, 100>;
The maximum trackable value is 2000 + 256 * 100 = 27,600 ms (~27.5 seconds).
Any request exceeding this limit gets clamped to the last bucket. This means:
A 5-minute request shows as ~27.5 seconds
All percentiles (p50, p95, p99, etc.) cap at this value
We lose visibility into actual latency for problematic slow requests
As the simplest solution I'd sugeest to add sum tracking to Percentile class to enable average latency calculation:
rate(sum_latency) / rate(count)
This would provide at least the average latency even when individual values exceed bucket limits.
