fix(core): emit judge usage telemetry on eval scorers by omeraplak · Pull Request #1168 · VoltAgent/voltagent

omeraplak · 2026-03-20T15:18:02Z

PR Checklist

Please check if your PR fulfills the following requirements:

The commit message follows our guidelines: https://voltagent.dev/docs/community/contributing/#commit-convention

Bugs / Features

Related issue(s) linked
Tests for the changes have been added
Docs have been added / updated
Changesets have been added https://voltagent.dev/docs/community/contributing/#creating-a-changeset

What is the current behavior?

LLM-based eval scorers can collect judge usage and provider cost information, but that telemetry is not emitted on scorer spans.

As a result, downstream observability and cost aggregation cannot reliably attribute eval scorer token/cost usage separately from the main agent run.

What is the new behavior?

createLLMJudgeScorer now preserves judge model, normalized token usage, and OpenRouter provider cost details in scorer metadata, and eval span creation maps that telemetry onto scorer span attributes.

This makes scorer-side usage visible in observability pipelines and enables downstream cost aggregation to split agent cost from eval scorer cost.

fixes N/A

Notes for reviewers

Verified with pnpm --filter @voltagent/core typecheck
Verified with pnpm --filter @voltagent/core build
The branch intentionally only includes the eval telemetry changes and the new changeset.

Summary by cubic

Emit judge usage and cost telemetry on eval scorer spans in @voltagent/core so observability and cost reports can separate eval scorer usage from the main agent run.

Bug Fixes
- Store judge model, normalized token usage (prompt, completion, total, cached, reasoning), and OpenRouter cost in scorer metadata.
- Populate scorer span attributes (ai.model.name, usage.*, usage.cost, usage.cost_details.*) from that telemetry.
- Normalize usage from success and error paths and extract provider cost from providerMetadata.

^{Written for commit 5598cc8. Summary will update on new commits.}

Summary by CodeRabbit

New Features
- Enhanced telemetry for eval scorer spans now captures judge model identification, comprehensive token usage metrics (including cached and reasoning tokens), and provider-reported cost breakdowns. This enables improved observability in backend systems and supports downstream cost aggregation that distinguishes eval scoring costs from agent operation costs.

changeset-bot · 2026-03-20T15:18:17Z

🦋 Changeset detected

Latest commit: 5598cc8

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
@voltagent/core	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

coderabbitai · 2026-03-20T15:18:32Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This change introduces telemetry capture for LLM judge scoring operations in VoltAgent. It extracts judge model information, token usage (including cached and reasoning tokens), and provider cost details from judge scorer execution, then attaches this metadata to observable span attributes for downstream observability and cost aggregation.

Changes

Cohort / File(s)	Summary
Changeset Documentation `.changeset/eval-scorer-cost-telemetry.md`	Adds changeset entry for `@voltagent/core` patch release documenting new judge telemetry emission on eval scorer spans (model, token usage, provider costs).
Judge Telemetry Extraction `packages/core/src/agent/eval.ts`	Introduces `JudgeTelemetry` interface and `extractJudgeTelemetry()` function with safe parsing helpers to read judge metadata from combined records, then extends `createScorerSpanAttributes` to attach extracted model name, token counts, and cost breakdowns to span attributes.
Judge Scorer Telemetry Capture `packages/core/src/eval/llm/create-judge-scorer.ts`	Enhances `createLLMJudgeScorer` to capture and normalize judge `usage` and `providerMetadata` from LLM calls, extract OpenRouter cost details, and attach captured telemetry to scorer metadata as `voltAgent.judge` on both success and error paths, including multiple helper functions for model resolution, cost extraction, and metadata normalization.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Scorer as LLM Judge Scorer
    participant LLM as Judge Model
    participant Span as Span Attributes

    Client->>Scorer: createLLMJudgeScorer.evaluate(payload)
    Scorer->>LLM: generateText(prompt)
    LLM-->>Scorer: text, usage, providerMetadata
    Scorer->>Scorer: extractJudgeTelemetry()<br/>(model, usage, costs)
    Scorer-->>Client: ScorerResult with voltAgent.judge metadata
    Client->>Span: createScorerSpanAttributes(metadata)
    Span->>Span: extractJudgeTelemetry()<br/>from metadata
    Span-->>Span: Attach ai.model.name,<br/>usage.*, usage.cost

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

feat: add tool-aware live-eval payloads and a deterministic tool-call accuracy scorer #1055: Modifies the same createScorerSpanAttributes function in eval.ts to extend span attributes with additional telemetry fields, creating potential interaction points with this PR's judge telemetry attachment logic.
fix(core): preserve usage and cost metadata on structured output failures #1163: Adds and preserves LLM usage and provider cost metadata across error paths and surfaces that telemetry onto observability spans, sharing similar goals of making cost data observable through span attributes.

Suggested reviewers

lzj960515

Poem

🐰 A judge hops in with token tales,
Usage counts on scoring scales,
Cost details caught and costs unfurled,
Judge telemetry takes the world! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely describes the main change: emitting judge usage telemetry on eval scorers, which directly aligns with the changeset and file modifications.
Description check	✅ Passed	The description comprehensively covers the template sections, clearly explains current vs. new behavior, documents the changes made, confirms changesets and typechecking, and includes verification details.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/eval-scorer-cost-telemetry

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cloudflare-workers-and-pages · 2026-03-20T15:22:41Z

Deploying voltagent with Cloudflare Pages

Latest commit:	`5598cc8`
Status:	🚫 Build failed.

View logs

cubic-dev-ai

1 issue found across 3 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/core/src/eval/llm/create-judge-scorer.ts">

<violation number="1" location="packages/core/src/eval/llm/create-judge-scorer.ts:238">
P2: OpenRouter judge telemetry parsing is incomplete and can miss provider cost fields when metadata uses snake_case keys.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-03-20T15:23:58Z

packages/core/src/eval/llm/create-judge-scorer.ts

+    ? providerMetadata.openrouter
+    : undefined;
+  const usage = isRecord(openRouterMetadata?.usage) ? openRouterMetadata.usage : undefined;
+  const costDetails = isRecord(usage?.costDetails) ? usage.costDetails : undefined;


P2: OpenRouter judge telemetry parsing is incomplete and can miss provider cost fields when metadata uses snake_case keys.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/src/eval/llm/create-judge-scorer.ts, line 238: <comment>OpenRouter judge telemetry parsing is incomplete and can miss provider cost fields when metadata uses snake_case keys.</comment> <file context> @@ -178,3 +205,112 @@ function stringify(value: unknown): string { + ? providerMetadata.openrouter + : undefined; + const usage = isRecord(openRouterMetadata?.usage) ? openRouterMetadata.usage : undefined; + const costDetails = isRecord(usage?.costDetails) ? usage.costDetails : undefined; + + if (!usage) { </file context>

fix(core): emit judge usage telemetry on eval scorers

5598cc8

This comment has been minimized.

Sign in to view

cubic-dev-ai bot reviewed Mar 20, 2026

View reviewed changes

omeraplak merged commit 2075bd9 into main Mar 20, 2026
22 of 24 checks passed

omeraplak deleted the fix/eval-scorer-cost-telemetry branch March 20, 2026 15:24

voltagent-bot mentioned this pull request Mar 20, 2026

ci(changesets): version packages #1164

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(core): emit judge usage telemetry on eval scorers#1168

fix(core): emit judge usage telemetry on eval scorers#1168
omeraplak merged 1 commit intomainfrom
fix/eval-scorer-cost-telemetry

omeraplak commented Mar 20, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

changeset-bot bot commented Mar 20, 2026

Uh oh!

This comment has been minimized.

coderabbitai bot commented Mar 20, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

cloudflare-workers-and-pages bot commented Mar 20, 2026

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

cubic-dev-ai bot Mar 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

omeraplak commented Mar 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Checklist

Bugs / Features

What is the current behavior?

What is the new behavior?

Notes for reviewers

Summary by cubic

Summary by CodeRabbit

Uh oh!

changeset-bot bot commented Mar 20, 2026

🦋 Changeset detected

Uh oh!

This comment has been minimized.

coderabbitai bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

cloudflare-workers-and-pages bot commented Mar 20, 2026

Deploying voltagent with Cloudflare Pages

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

omeraplak commented Mar 20, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 20, 2026 •

edited

Loading

cubic-dev-ai bot Mar 20, 2026 •

edited

Loading