Reject full-context chat prompts before max_tokens underflows by rasdani · Pull Request #2402 · PrimeIntellect-ai/prime-rl

rasdani · 2026-05-03T15:53:54Z

Summary

reject rendered standard chat prompts that already fill the model context before vLLM derives max_tokens=0
preserve /chat/completions/tokens behavior by validating after stitched request.tokens are installed, so TITO checks the actual request tokens
add focused coverage for the context-limit guard and token-endpoint deferral

Why This Fixes The Bug

Before this change, the standard /v1/chat/completions serving path rendered the prompt and then called vLLM get_max_tokens(...). When prompt_len == max_model_len, that helper can return 0; request.to_sampling_params(max_tokens=0) then surfaces as:

BadRequestError: max_tokens must be at least 1, got 0

That error is a generic model error to the rollout env, so group-scored rollouts can reschedule an otherwise nearly complete group.

After this change, OpenAIServingChatWithTokens.render_chat_request() validates rendered standard chat prompts before vLLM reaches get_max_tokens. If the prompt has no generation room, it raises VLLMValidationError(parameter="input_tokens"), which vLLM serializes as a 400 BadRequestError with a context-length message:

This model's maximum context length is ... However, your request has ... input tokens ... (parameter=input_tokens, value=...)

The verifiers OpenAI client already classifies messages containing maximum context length / context length as OverlongPromptError, and the env handles that as prompt_too_long instead of a generic model failure.

Impact

Long multi-turn rollouts that reach max_model_len now stop through the existing overlong-prompt path instead of sending max_tokens=0 to vLLM. In the observed SLURM run this was not the dominant throughput bottleneck, but each occurrence can still be expensive because group scoring discards partial group progress and reschedules.

Verification

uv run pytest tests/unit/inference/test_serving_chat_with_tokens.py tests/unit/inference/test_serving_generate.py
uv run ruff format --check src/prime_rl/inference/vllm/serving_chat_with_tokens.py tests/unit/inference/test_serving_chat_with_tokens.py
uv run ruff check src/prime_rl/inference/vllm/serving_chat_with_tokens.py tests/unit/inference/test_serving_chat_with_tokens.py

Note

Medium Risk
Changes request validation behavior in the OpenAI chat serving path; could alter which requests are rejected and the error shape returned, but is localized and covered by unit tests.

Overview
Adds an explicit context-length guard for standard /v1/chat/completions requests by validating rendered engine_prompts in render_chat_request and returning a VLLMValidationError(parameter="input_tokens") when the prompt already fills the model context (avoiding max_tokens=0 underflow errors).

Refactors the same check into _validate_prompt_has_generation_room() and reuses it in the token-in (/chat/completions/tokens) path, while deferring validation there until after request.tokens are stitched in so the check reflects the actual prompt tokens. Includes new unit tests covering both the guard behavior and the token-endpoint deferral.

^{Reviewed by Cursor Bugbot for commit 82d8849. Bugbot is set up for automated code reviews on this repo. Configure here.}

Reject full-context chat prompts before max_tokens underflows

82d8849

rasdani requested review from mikasenghaas and samsja May 3, 2026 16:20

rasdani marked this pull request as ready for review May 3, 2026 16:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reject full-context chat prompts before max_tokens underflows#2402

Reject full-context chat prompts before max_tokens underflows#2402
rasdani wants to merge 1 commit intomainfrom
fix/chat-max-tokens-zero

rasdani commented May 3, 2026 •

edited by cursor Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rasdani commented May 3, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why This Fixes The Bug

Impact

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rasdani commented May 3, 2026 •

edited by cursor Bot

Loading