RFC feedback request: Vendor-neutral LLM observability semantic convention (gen_ai.* metrics) — looking for input from GenAI SIG maintainers #5069

sauravGit · 2026-05-05T19:24:48Z

sauravGit
May 5, 2026

Hi OpenTelemetry community,

I'm working on a vendor-neutral, OTEL-compatible semantic convention and SDK layer for standardizing LLM observability across providers, frameworks, and platforms — and I'd genuinely value feedback from GenAI SIG maintainers and contributors on the core metric set and OTEL mapping.

The problem: Every LLM observability tool today defines its own metric names, KPI sets, and attribute schemas. Teams instrumenting production LLM apps have to re-instrument every time they change providers or backends. There is no shared language.

What I'm proposing: A canonical gen_ai.* metric schema — built on top of the existing OpenTelemetry GenAI semantic conventions — that covers:

Core KPIs: latency (TTFT, total), token usage (input/output), cost, error rate, retry rate, rate limit events
Required span attributes: gen_ai.system, gen_ai.request.model, gen_ai.operation.name
Derived metrics: cost-per-request, token efficiency, success rate
Interoperability rules: how backends can emit differently-named attributes and still map to the canonical schema

Why I'm bringing it here: The OTel GenAI SIG has done excellent work on span conventions and is the natural home for a standardized metric layer. I want to make sure this proposal complements (not duplicates or conflicts with) the existing gen-ai-metrics spec in development.

Specific questions for this community:

Does the proposed KPI set (TTFT, token counts, cost, error rate) overlap with or conflict with any in-flight spec work in the GenAI SIG?
Are there naming or unit conventions I should follow for the metric instruments (gen_ai.client.token.usage histogram vs counter approach)?
Is a GitHub Discussion the right venue, or should this go directly to semantic-conventions as an Issue/PR?

Links:

RFC + full spec: https://github.com/sauravGit/open-llm-observability/blob/main/RFC.md
GitHub Discussion (with structured KPI table): RFC: Universal LLM Observability Semantic Convention (v0.4) sauravGit/open-llm-observability#1
Python SDK scaffold (OTEL-native): https://github.com/sauravGit/open-llm-observability/tree/main/sdk/python

This is v0.1 and explicitly designed to evolve based on community input. I'm not trying to build another vendor tool — the goal is a shared language for LLM telemetry that any OTEL-compatible backend can consume.

Happy to be redirected to the right SIG channel, mailing list, or issue tracker if this is the wrong venue. Thank you for any feedback.

trask · 2026-05-05T19:35:16Z

trask
May 5, 2026
Maintainer

hi @sauravGit! I'd recommend opening an issue in the brand new https://github.com/open-telemetry/semantic-conventions-genai

1 reply

sauravGit May 6, 2026
Author

Thanks @trask! Really appreciate the pointer — I'll open an issue in semantic-conventions-genai as you suggest.

For reference, the RFC has been significantly expanded since this post (now at v0.2) with:

Migration mapping tables from OpenLLMetry, Langfuse, Arize Phoenix, and AWS Bedrock → canonical gen_ai.* names
Corrected instrument types (Histograms in s, Counters for tokens with {token} unit)
Extension packs (gen_ai.rag.*, gen_ai.agent.*, gen_ai.eval.*, gen_ai.safety.*)
Derived KPI formulas, PromQL dashboard templates, and versioning policy
A clear upstream path to OTel stable via the GenAI SIG

Full RFC v0.2: https://github.com/sauravGit/open-llm-observability/blob/main/RFC.md

I'll bring the issue to semantic-conventions-genai shortly.

sauravGit · 2026-05-06T16:07:28Z

sauravGit
May 6, 2026
Author

RFC v0.3 update — the upstream issue is now open: semantic-conventions-genai #101, consolidating open issues #14, #23, #76, #93 into a single proposal for SIG review.

Based on research alignment with the existing OTel GenAI spec, RFC v0.3 makes three key fixes:

Reframed as normalization/migration layer — not a parallel new standard. OTel already defines gen_ai.client.operation.duration, gen_ai.server.time_to_first_token, gen_ai.client.token.usage. This RFC extends those, resolving the open gaps.
Added OpenInference / Arize Phoenix to the fragmentation table with correct mappings (llm.latency, llm.token_count.prompt/completion, llm.cost.total).
Moved gen_ai.usage.cost to optional Cost extension pack — not all providers publish pricing programmatically.

Full RFC v0.3: https://github.com/sauravGit/open-llm-observability/blob/main/RFC.md

Welcome any continued feedback from the SIG on the open questions (cost unit, error instrument type, TTFT streaming attribute).

0 replies

musaabhasan · 2026-05-09T08:30:10Z

musaabhasan
May 9, 2026

For an OpenTelemetry-facing convention, I would keep the distinction between spans, metrics, logs/events, and derived evaluation scores very explicit.

A production LLM trace usually needs at least three layers:

request/response spans for model calls, tool calls, retrieval, reranking, and guardrails
metrics for latency, tokens, cost, error rate, retry count, cache hit rate, and evaluator score distributions
events/logs for redactions, policy decisions, safety blocks, citation failures, and judge rationales

The convention should avoid putting high-cardinality or sensitive values directly into metric attributes. Prompt text, completion text, retrieved chunks, and user IDs should be trace/log payloads with redaction controls, not metric dimensions.

I would also include lineage attributes early: prompt version, model route, dataset/eval version, tool name/version, retrieval collection version, and guardrail policy version. Those are the fields teams need when a regression appears and they need to identify whether the model, prompt, retrieval corpus, or policy changed.

1 reply

sauravGit May 11, 2026
Author

Thanks @musaabhasan, this is very helpful and aligns with the direction I want to take in the next RFC update.

I agree the convention should make the signal boundary explicit rather than flattening everything into metrics. Based on your breakdown, I’ll update the RFC to distinguish:

Spans: model calls, tool calls, retrieval/reranking, guardrails, and evaluator runs
Metrics: low-cardinality operational measurements such as latency, token usage, cost where available, retry count, cache hit rate, error/status counts, and score distributions
Events/logs: sensitive or high-cardinality payloads such as prompt text, completion text, retrieved chunks, redactions, policy decisions, safety blocks, citation failures, and judge rationales
Derived KPIs: dashboard/query-layer calculations rather than values every SDK must compute uniformly

I also agree with the lineage point. I’ll add a section proposing stable lineage fields such as prompt version, model route, dataset/eval version, tool name/version, retrieval collection version, and guardrail policy version, with cardinality guidance.

I’ll incorporate this into the RFC and mirror the signal-boundary clarification into the upstream semantic-conventions-genai issue so the SIG can weigh in.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC feedback request: Vendor-neutral LLM observability semantic convention (gen_ai.* metrics) — looking for input from GenAI SIG maintainers #5069

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

RFC feedback request: Vendor-neutral LLM observability semantic convention (gen_ai.* metrics) — looking for input from GenAI SIG maintainers #5069

Uh oh!

sauravGit May 5, 2026

Replies: 3 comments · 2 replies

Uh oh!

Uh oh!

trask May 5, 2026 Maintainer

Uh oh!

sauravGit May 6, 2026 Author

Uh oh!

sauravGit May 6, 2026 Author

Uh oh!

musaabhasan May 9, 2026

Uh oh!

sauravGit May 11, 2026 Author

sauravGit
May 5, 2026

Replies: 3 comments 2 replies

trask
May 5, 2026
Maintainer

sauravGit May 6, 2026
Author

sauravGit
May 6, 2026
Author

musaabhasan
May 9, 2026

sauravGit May 11, 2026
Author