fix(core): emit judge usage telemetry on eval scorers#1168
Conversation
🦋 Changeset detectedLatest commit: 5598cc8 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
This comment has been minimized.
This comment has been minimized.
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughThis change introduces telemetry capture for LLM judge scoring operations in VoltAgent. It extracts judge model information, token usage (including cached and reasoning tokens), and provider cost details from judge scorer execution, then attaches this metadata to observable span attributes for downstream observability and cost aggregation. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Scorer as LLM Judge Scorer
participant LLM as Judge Model
participant Span as Span Attributes
Client->>Scorer: createLLMJudgeScorer.evaluate(payload)
Scorer->>LLM: generateText(prompt)
LLM-->>Scorer: text, usage, providerMetadata
Scorer->>Scorer: extractJudgeTelemetry()<br/>(model, usage, costs)
Scorer-->>Client: ScorerResult with voltAgent.judge metadata
Client->>Span: createScorerSpanAttributes(metadata)
Span->>Span: extractJudgeTelemetry()<br/>from metadata
Span-->>Span: Attach ai.model.name,<br/>usage.*, usage.cost
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
1 issue found across 3 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/core/src/eval/llm/create-judge-scorer.ts">
<violation number="1" location="packages/core/src/eval/llm/create-judge-scorer.ts:238">
P2: OpenRouter judge telemetry parsing is incomplete and can miss provider cost fields when metadata uses snake_case keys.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| ? providerMetadata.openrouter | ||
| : undefined; | ||
| const usage = isRecord(openRouterMetadata?.usage) ? openRouterMetadata.usage : undefined; | ||
| const costDetails = isRecord(usage?.costDetails) ? usage.costDetails : undefined; |
There was a problem hiding this comment.
P2: OpenRouter judge telemetry parsing is incomplete and can miss provider cost fields when metadata uses snake_case keys.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/src/eval/llm/create-judge-scorer.ts, line 238:
<comment>OpenRouter judge telemetry parsing is incomplete and can miss provider cost fields when metadata uses snake_case keys.</comment>
<file context>
@@ -178,3 +205,112 @@ function stringify(value: unknown): string {
+ ? providerMetadata.openrouter
+ : undefined;
+ const usage = isRecord(openRouterMetadata?.usage) ? openRouterMetadata.usage : undefined;
+ const costDetails = isRecord(usage?.costDetails) ? usage.costDetails : undefined;
+
+ if (!usage) {
</file context>
PR Checklist
Please check if your PR fulfills the following requirements:
Bugs / Features
What is the current behavior?
LLM-based eval scorers can collect judge usage and provider cost information, but that telemetry is not emitted on scorer spans.
As a result, downstream observability and cost aggregation cannot reliably attribute eval scorer token/cost usage separately from the main agent run.
What is the new behavior?
createLLMJudgeScorernow preserves judge model, normalized token usage, and OpenRouter provider cost details in scorer metadata, and eval span creation maps that telemetry onto scorer span attributes.This makes scorer-side usage visible in observability pipelines and enables downstream cost aggregation to split agent cost from eval scorer cost.
fixes N/A
Notes for reviewers
pnpm --filter @voltagent/core typecheckpnpm --filter @voltagent/core buildSummary by cubic
Emit judge usage and cost telemetry on eval scorer spans in
@voltagent/coreso observability and cost reports can separate eval scorer usage from the main agent run.ai.model.name,usage.*,usage.cost,usage.cost_details.*) from that telemetry.providerMetadata.Written for commit 5598cc8. Summary will update on new commits.
Summary by CodeRabbit