[bp/1.35] Support limiting stats number per scope#539
Merged
tedjpoole merged 2 commits intoMay 21, 2026
Merged
Conversation
Change-Id: If3a45283b13cfda7d4f9a7bb661a1573f552ed7e Commit Message: Introduce mark and sweep eviction of stale metrics in a stats scope. Additional Description: The intended use case is the high cardinality metrics generated from the request data (e.g. [Istio standard metrics](https://istio.io/latest/docs/reference/config/metrics/)). This in combination with the cardinality bounds (future PR) would ensure bounded metric resource usage. The algorithm works as follows: 1. An "evictable" scope is allocated by a filter. 2. A delta stats sink is configured, e.g. OTLP. 3. At every flush interval, a scope metric that is used (e.g. has observed a data point) is marked as unused. A metric that has not been used is deleted from the central caches. 4. A notification is sent to all workers to purge scope stale metrics from their thread-local caches. 5. Once all workers complete, the unused metrics are purged from the allocator. There are several edge conditions that need to be explained to validate correctness of this algorithm: 1. A worker attempting to use a stale metric after (3) but before (4) might have its data lost. It will not be lost if 1) the same metric is recreated in the central cache by another worker since all metrics are uniquely indexed in the allocators; or 2) we implement deferred allocator deletions to await for the flush operation. 2. A worker should not use a stored stale metric after (4). This requires that workers to not store the metrics by reference (hence, this solution will not work for most xDS metrics). Thread local cache references are always deleted before the storage is deleted. 3. Histograms are handled slightly different because the parent histogram needs to be "merged" to observe usage, and clearing the usage requires updating all "children" histograms. Because we do this during flush, merging is always done first. 4. A metric that is re-created after eviction would continue having its start time set as the original metric. This is a limitation of Envoy since it does not store the metric start times, but it is not an issue with delta aggregation in OTLP. Delta is the recommended protocol for handling high cardinality or sparse metric data. We could add start_time in a follow-up. Risk Level: low, requires explicit usage Testing: unit and a load test with Istio Proxy Docs Changes: none Release Notes: none --------- Signed-off-by: Kuat Yessenov <kuat@google.com> Signed-off-by: Ted Poole <tpoole@redhat.com>
Commit Message: add support for limiting the max number of stats(counter/gauge/histogram) per scope. This helps with memory explosion caused by high cardinality stats. Risk Level: low Testing: unit test covered Docs Changes: no doc update as it is a library support for internal usage Release Notes: updated Platform Specific Features: no --------- Signed-off-by: Xuyang Tao <taoxuy@google.com> Signed-off-by: Ted Poole <tpoole@redhat.com>
jwendell
approved these changes
May 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Back port of: