feat(renderers): switch client to vLLM 0.20 /inference/v1/generate by hallerite · Pull Request #1282 · PrimeIntellect-ai/verifiers

hallerite · 2026-05-03T22:39:20Z

Summary

vLLM 0.20 ships a unified tokens-in / tokens-out endpoint at /inference/v1/generate that supersedes the bespoke /v1/generate handler prime-rl shipped on top of vllm 0.19. Migrate verifiers' RendererClient onto the new endpoint and pin the renderers package to its lean rewrite.

Companion PRs:

feat: switch client to vLLM 0.20 /inference/v1/generate renderers#1 — lean generate() rewrite (formerly this PR's packages/renderers/; now lives in its own repo)
feat: unify renderer + teacher endpoints onto vLLM 0.20 /inference/v1/generate prime-rl#2408 — consumer migration onto the new endpoint

What changed

Renderers pin (`pyproject.toml` + `uv.lock`)

[tool.uv.sources] pins renderers to PrimeIntellect-ai/renderers@40bc2a6 (the head of the companion PR — the lean generate() rewrite).
packages/renderers/ deleted from the verifiers tree; the package is no longer vendored.
Version constraint bumped to renderers>=0.1.6 in [project] deps and the renderers extra. Once renderers-v0.1.6 publishes to PyPI, drop [tool.uv.sources] and let it resolve from PyPI directly.
Drops the uv pip install -e packages/renderers CI hack — no longer needed once renderers resolves through [tool.uv.sources].
Deletes .github/workflows/publish-renderers.yml — the publish flow now lives in the renderers repo (ci: add PyPI publish workflow renderers#2).

`RendererClient` adapter (`verifiers/clients/renderer_client.py`)

get_native_response builds a sampling_params dict from the caller's flat sampling_args / extra_body and calls the new generate(...) with named args. This is the right place for the OpenAI-shaped → lean adaptation; the renderers package itself no longer carries OpenAI-SDK conventions.
from_native_response reads request_id instead of id; Usage is reconstructed from token-list lengths (the new endpoint doesn't return a usage block).

Test plan

tests/test_renderer_client.py + tests/test_renderer_e2e.py updated for the new request_id / sampling_params shapes — 42/42 pass against the external pin.
e2e renderer rollout against a live vllm 0.20 server (prime-rl#2408 + this PR's client): 20-step multi_reverse_text RL run, 2688 calls to /inference/v1/generate, eval Avg@4 = 0.83 — identical numbers to the fat-API version.
ruff format --check / ruff check clean.
ty check verifiers passes (0 errors).
Rebased onto current main (picks up Make renderers optional and add PyPI publish workflow #1279 — renderers as optional / PyPI publish workflow).

Notes

VLMs are still blocked by prime-rl's validate_renderer_vs_vlm config validator. The new endpoint already supports MM features end-to-end; lifting the ban needs the renderer client to build features client-side (HF processor → MultiModalKwargsItem → base64 msgpack). That's a separate PR — easier to review on its own once this lands.

Note

Medium Risk
Moderate risk because it changes the inference request/response contract (generate call shape, request_id, and usage reconstruction) and removes the vendored packages/renderers implementation in favor of a pinned external dependency.

Overview
Switches RendererClient over to renderers.client.generate, adapting OpenAI-style sampling args into a sampling_params dict, passing through cache_salt/priority/headers, and updating response handling to use request_id and reconstruct usage from token lengths.

Removes the in-repo packages/renderers implementation (and its publish-renderers GitHub workflow) and pins renderers via pyproject.toml (renderers>=0.1.6 plus a git tool.uv.sources override) so the renderer code is consumed as an external package.

^{Reviewed by Cursor Bugbot for commit b494fb7. Bugbot is set up for automated code reviews on this repo. Configure here.}

Replace the OpenAI-chat-completions-shaped ``completions_request`` with a lean ``generate()`` built around what /inference/v1/generate actually exposes: - Structured ``sampling_params: dict`` arg, forwarded to vLLM verbatim. No more ``extra_body`` fallback, no ``_SAMPLING_KEYS`` allowlist, no ``max_completion_tokens`` ↔ ``max_tokens`` aliasing — those are OpenAI-SDK habits that don't apply here. - Top-level ``cache_salt`` / ``priority`` / ``extra_headers`` as named args (matching the wire shape, no rummaging through extra_body). - Result dict drops the ChatCompletion-shaped fillers (``id``, ``created``, ``model``, ``usage``); keeps ``request_id`` (the actual field /inference/v1/generate returns) and the renderer-specific fields (content, reasoning_content, tool_calls, finish_reason, prompt/completion_ids, completion_logprobs, routed_experts). - ``stop_token_ids`` (from the renderer) and ``logprobs=1`` are forced by us; everything else flows through. Kept: the ``finish_reason: stop → tool_calls`` promotion when the renderer extracts tool calls client-side (downstream agent loops genuinely depend on it), the AsyncOpenAI transport (auth + retries), and the overlong-prompt 4xx diagnostic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The renderers package's ``completions_request`` was renamed to ``generate`` and grew a structured ``sampling_params`` arg. Update ``RendererClient`` and the e2e test scaffold to match. - ``get_native_response``: build a ``sampling_params`` dict from the caller's flat sampling_args / extra_body, then call ``generate(...)`` with named ``cache_salt`` / ``priority`` / ``extra_headers`` args. This is where the OpenAI-SDK kwarg conventions belong (the verifiers shim adapts the OpenAI-shaped surface to the lean generate() API); the renderer client itself no longer carries them. - ``from_native_response``: read ``request_id`` (the field /inference/v1/generate actually returns) instead of ``id``; reconstruct ``Usage`` from token-list lengths since the endpoint doesn't return a usage block. - ``ScriptedVLLM``: speak the new wire shape — POST to /inference/v1/generate, body uses ``token_ids`` and nested ``sampling_params``, response returns ``request_id`` and ``logprobs.content[*]``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The /inference/v1/generate switch is a wire-protocol break against v0.1.5 (which targets the legacy /generate endpoint). Tag this as a new release so the PyPI publish workflow picks it up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI's ``uv sync`` resolves ``renderers>=0.1.5`` from PyPI, but this PR bumps to v0.1.6 with the new ``generate`` API. Pre-merge there's no PyPI release for v0.1.6 yet, so the import fails: ImportError: cannot import name 'generate' from 'renderers.client' Add ``uv pip install -e packages/renderers`` after ``uv sync`` in both test.yml and style.yml so CI uses the in-repo source. ``--no-sync`` on the actual test/ty step prevents uv from rolling renderers back to the PyPI version. Drop these steps after a renderers-v0.1.6 tag publishes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…package Now that renderers lives in its own repo (https://github.com/PrimeIntellect-ai/renderers), pin the verifiers dep directly at PrimeIntellect-ai/renderers#1's head (40bc2a6 — the lean ``generate()`` rewrite) and remove ``packages/renderers/`` from the verifiers tree. This also drops the ``uv pip install -e packages/renderers`` CI hack introduced in c969123 — no longer needed once renderers resolves through ``[tool.uv.sources]``. Bump the version constraints to ``renderers>=0.1.6``. Once renderers v0.1.6 publishes to PyPI, drop ``[tool.uv.sources]`` and let the constraint resolve from the trusted publisher. Companion to: - PrimeIntellect-ai/renderers#1 (lean ``generate()`` rewrite) - PrimeIntellect-ai/prime-rl#2408 (consumer migration)

The renderers package now lives in PrimeIntellect-ai/renderers and ships its own publish workflow (PrimeIntellect-ai/renderers#2). This stub no longer has a target — packages/renderers/ was removed in 1d34beb.

Replace the OpenAI-chat-completions-shaped ``completions_request`` with a lean ``generate()`` built around what /inference/v1/generate actually exposes: - Structured ``sampling_params: dict`` arg, forwarded to vLLM verbatim. No more ``extra_body`` fallback, no ``_SAMPLING_KEYS`` allowlist, no ``max_completion_tokens`` ↔ ``max_tokens`` aliasing — those are OpenAI-SDK habits that don't apply here. - Top-level ``cache_salt`` / ``priority`` / ``extra_headers`` as named args (matching the wire shape, no rummaging through extra_body). - Result dict drops the ChatCompletion-shaped fillers (``id``, ``created``, ``model``, ``usage``); keeps ``request_id`` (the actual field /inference/v1/generate returns) and the renderer-specific fields (content, reasoning_content, tool_calls, finish_reason, prompt/completion_ids, completion_logprobs, routed_experts). - ``stop_token_ids`` (from the renderer) and ``logprobs=1`` are forced by us; everything else flows through. Kept: the ``finish_reason: stop → tool_calls`` promotion when the renderer extracts tool calls client-side (downstream agent loops genuinely depend on it), the AsyncOpenAI transport (auth + retries), and the overlong-prompt 4xx diagnostic. Bump version 0.1.5 → 0.1.6 — the wire format change is a break against v0.1.5 (which targets the legacy /generate route). Tag renderers-v0.1.6 to publish. Lifted from PrimeIntellect-ai/verifiers#1282 packages/renderers/ now that this package lives in its own repo.

renderers#1 squash-merged to PrimeIntellect-ai/renderers main as 9acdc60. Repoint [tool.uv.sources] from the now-deleted PR branch SHA (40bc2a6) to the squash-merge commit so the pin tracks main rather than a side-history commit.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ce-v1-generate # Conflicts: # packages/renderers/pyproject.toml # packages/renderers/uv.lock # uv.lock

hallerite mentioned this pull request May 3, 2026

feat: unify renderer + teacher endpoints onto vLLM 0.20 /inference/v1/generate PrimeIntellect-ai/prime-rl#2408

Open

5 tasks

hallerite force-pushed the feat/renderer-inference-v1-generate branch from 4e80478 to d1f7821 Compare May 4, 2026 21:22

hallerite and others added 3 commits May 4, 2026 21:45

hallerite force-pushed the feat/renderer-inference-v1-generate branch from d1f7821 to 0b797d2 Compare May 4, 2026 21:46

hallerite mentioned this pull request May 4, 2026

feat: switch client to vLLM 0.20 /inference/v1/generate PrimeIntellect-ai/renderers#1

Merged

5 tasks

hallerite marked this pull request as ready for review May 5, 2026 11:14

hallerite mentioned this pull request May 5, 2026

ci: add PyPI publish workflow PrimeIntellect-ai/renderers#2

Merged

ci: drop publish-renderers workflow

cb05bb4

The renderers package now lives in PrimeIntellect-ai/renderers and ships its own publish workflow (PrimeIntellect-ai/renderers#2). This stub no longer has a target — packages/renderers/ was removed in 1d34beb.

chore: repin renderers to renderers main HEAD

2bf5287

renderers#1 squash-merged to PrimeIntellect-ai/renderers main as 9acdc60. Repoint [tool.uv.sources] from the now-deleted PR branch SHA (40bc2a6) to the squash-merge commit so the pin tracks main rather than a side-history commit.

eligotts reviewed May 5, 2026

View reviewed changes

Comment thread verifiers/clients/renderer_client.py

eligotts and others added 2 commits May 5, 2026 12:48

fix(renderer-client): normalize empty content like token client (#1290)

f68bc5b

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into feat/renderer-inferen…

b494fb7

…ce-v1-generate # Conflicts: # packages/renderers/pyproject.toml # packages/renderers/uv.lock # uv.lock

hallerite merged commit 7bdc769 into main May 5, 2026
8 checks passed

hallerite deleted the feat/renderer-inference-v1-generate branch May 5, 2026 22:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(renderers): switch client to vLLM 0.20 /inference/v1/generate#1282

feat(renderers): switch client to vLLM 0.20 /inference/v1/generate#1282
hallerite merged 9 commits intomainfrom
feat/renderer-inference-v1-generate

hallerite commented May 3, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hallerite commented May 3, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Renderers pin (pyproject.toml + uv.lock)

RendererClient adapter (verifiers/clients/renderer_client.py)

Test plan

Notes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hallerite commented May 3, 2026 •

edited by cursor Bot

Loading

Renderers pin (`pyproject.toml` + `uv.lock`)

`RendererClient` adapter (`verifiers/clients/renderer_client.py`)