Overriding sampling rate in certain cases #1654
-
|
Is it possible to override the sampling rate, not decision, when you create a span without using the SDK part of When we create the tracer provider we create a Right now the SDK part of the consumer_tracer: opentelemetry.sdk.trace.Tracer = get_tracer(options.app_name, options.app_version)
consumer_tracer.sampler = ParentBasedTraceIdRatio(0.5)but then the application might crash if we use a |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
|
@Gyllsdorff |
Beta Was this translation helpful? Give feedback.
-
|
Dynamic sampling rate override is important for multi-agent systems where you need different sampling strategies for different call types. For agent workloads specifically: Budget-weighted sampling — for agents with tight cost budgets, you want 100% sampling (never drop spans) because every trace might be evidence you need for debugging a cost overrun. For high-volume retrieval calls from well-behaved agents, aggressive sampling is fine. The sampling decision should be informed by the agent's budget state. Error escalation sampling — if an agent starts failing, escalate to 100% sampling for that agent until the issue is diagnosed. If the agent is healthy, sample at the background rate. This requires the sampler to have access to the agent's recent error rate. Delegation-depth-based sampling — traces at delegation depth 1 (root agent) are always sampled; deeper traces can be sampled at lower rates. This gives you full visibility into the orchestration layer while managing volume from leaf agents. Tail-based sampling for cost anomalies — rather than head-based sampling (decide at span start), consider tail-based sampling where you keep spans that ended up being expensive (actual cost > expected cost by >2x). This catches the interesting cases without keeping all the boring ones. Implementation: a custom We've implemented tiered sampling for KinthAI's agent observability layer — the design decisions here: https://blog.kinthai.ai/221-agents-multi-agent-coordination-lessons Is the override needed for specific routes, specific services, or dynamic conditions at runtime? |
Beta Was this translation helpful? Give feedback.
-
|
Overriding sampling rate for specific cases in OpenTelemetry is a common need — you want high sampling for critical paths and low sampling for noisy, cheap operations. A few approaches: Approach 1: Composite sampler (head-based) from opentelemetry.sdk.trace.sampling import (
TraceIdRatioBased, ALWAYS_ON, ALWAYS_OFF, ParentBased
)
class RouteAwareSampler(Sampler):
def should_sample(self, context, trace_id, name, kind, attributes, links):
# Always sample error paths
if attributes.get("http.status_code", 200) >= 500:
return ALWAYS_ON.should_sample(...)
# High rate for critical business paths
if name.startswith("payment/"):
return TraceIdRatioBased(0.5).should_sample(...)
# Low rate for health checks
if name == "GET /health":
return TraceIdRatioBased(0.001).should_sample(...)
# Default
return TraceIdRatioBased(0.1).should_sample(...)Approach 2: Tail-based sampling in the collector Approach 3: Baggage-based override For agent systems specifically: |
Beta Was this translation helpful? Give feedback.
@Gyllsdorff
I'm not sure how you would be able to do this since
Sampleris an SDK concept. Perhaps you could make your own "overridable" custom Sampler that inherits fromSamplerand has some APIs that could change the sampling percentage during runtime?