Fix eval push verifiers upload parity by d42me · Pull Request #588 · PrimeIntellect-ai/prime

d42me · 2026-04-30T20:37:25Z

Summary

Reuse the same verifiers result normalization for automatic uploads and prime eval push.
Preserve rollout viewer data by carrying non-standard vf-eval fields such as timing, token_usage, trajectory, status flags, and state columns through info.
Keep avg_* metrics and metadata intact for manually pushed vf-eval outputs.
Set the evaluation dataset from the pushed environment, matching automatic prime eval run uploads.

Validation

uv run pytest packages/prime/tests -q
uv run ruff check packages/prime/src/prime_cli/commands/evals.py packages/prime/src/prime_cli/utils/eval_push.py packages/prime/tests/test_eval_push.py
uv run ty check packages/prime/src packages/prime/tests/test_eval_push.py

Note

Medium Risk
Changes environment resolution semantics in the Evals SDK and prime eval push, which could affect how evaluations link to environments and may surface new 404/validation errors for previously auto-created names.

Overview
Aligns manual prime eval push uploads with automatic verifiers uploads by reusing shared normalization: avg_* metrics are extracted consistently, result samples are normalized (field aliases, timing-derived total_time/latency_ms), and non-standard vf-eval fields are preserved under info.

Prevents unintended environment creation during evaluation creation/push: the Evals SDK switches name-based environment resolution from /environmentshub/resolve to lookup-only /environmentshub/lookup, slug lookups use GET /environmentshub/{owner}/{name}/@latest, and create_evaluation now allows dataset-only evaluations (only erroring when environments are provided but none resolve). The CLI now treats bare env names as dataset labels and only links environments when an owner slug is provided, while also setting dataset from the pushed env reference.

^{Reviewed by Cursor Bugbot for commit 61cbfdc. Bugbot is set up for automated code reviews on this repo. Configure here.}

JannikSt · 2026-05-04T22:26:06Z

@codex review

chatgpt-codex-connector · 2026-05-04T22:29:57Z

Codex Review: Didn't find any major issues. 👍

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

d42me added 2 commits April 30, 2026 13:34

Fix eval push verifiers upload parity

973b32f

Preserve verifiers rollout state in eval uploads

7e78269

d42me requested review from JannikSt and burnpiro April 30, 2026 21:52

Prevent eval push from creating environments

c62ce0b

JannikSt previously approved these changes May 4, 2026

View reviewed changes

Use owner detail lookup for eval environment slugs

61cbfdc

d42me dismissed JannikSt’s stale review via 61cbfdc May 4, 2026 22:28

JannikSt approved these changes May 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix eval push verifiers upload parity#588

Fix eval push verifiers upload parity#588
d42me wants to merge 4 commits intomainfrom
fix/eval-push-vf-results-parity

d42me commented Apr 30, 2026 •

edited by cursor Bot

Loading

Uh oh!

JannikSt commented May 4, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

d42me commented Apr 30, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Uh oh!

JannikSt commented May 4, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

d42me commented Apr 30, 2026 •

edited by cursor Bot

Loading