individual rollouts by faresobeid · Pull Request #1865 · PrimeIntellect-ai/prime-rl

faresobeid · 2026-02-24T05:20:57Z

Re-write the scheduler to do rollouts at the rollout level instead of group level. This way any long tails within groups are handled, so when a rollout within a group finishes it can move on to another group.
Currently if any env needs group scoring, we go back to previous behaviour of group rollouts although ideally we make it so verifiers is closer here and we can do scoring within prime-rl

Note

Medium Risk
Rewrites the training rollout scheduler and changes verification/scoring configuration, which can affect throughput, reward computation, and cancellation behavior during training.

Overview
Refactors rollout scheduling from group-level to individual rollout tasks to reduce long-tail stalls: the scheduler now tracks per-example “groups” internally while keeping max_inflight_rollouts as the primary concurrency limit, and updates batch inflight/off-policy metrics accordingly.

Adds deferred group scoring support for tasks whose rubrics require group-level reward functions: training envs run with score_rollouts=False and the scheduler calls rubric.score_group() once all rollouts for an example complete.

Replaces orchestrator.buffer.skip_verification with a new top-level orchestrator.verification.enabled switch (with validation preventing reward-dependent buffer features when disabled), updates docs/changelog, and adds a unit test covering off-policy update behavior during interleaved group cancellation.

^{Written by Cursor Bugbot for commit 72d5b87. This will update automatically on new commits. Configure here.}

src/prime_rl/orchestrator/scheduler.py

mikasenghaas

nice, yea lets do some testing on this to verify that we dont have any async race conditions anywhere but directionally looks good.

i think mid-term we want to move away from the verifiers env group for training envs and make our abstractions "multi-env" aware by default, e.g. smth like having a buffer and scheduler per env goverened a "scheduling" component on top or smth bc i think we will want more and more fine-grained control over how each env behaves (e.g. here whether or not to use gruop scheduling) and its always awkward to code this in an abstraction that handles multiple cases where you need conditional everywhere

src/prime_rl/orchestrator/scheduler.py

samsja · 2026-02-25T22:03:11Z

lets wait for @mikasenghaas review before merging

Co-authored-by: faresobeid <faresobeid@users.noreply.github.com>

mikasenghaas · 2026-02-27T09:55:20Z

src/prime_rl/configs/orchestrator.py

-                    raise ValueError(
-                        "max_inflight_rollouts conflicts with oversampling_factor * batch_size"
-                    )
+                    raise ValueError("max_inflight_rollouts conflicts with oversampling_factor * batch_size")


imo, could just deprectea oversampling_factor, no?

i agree but @samsja mentioned to keep for backwards compatibility (although im hating this lol)

src/prime_rl/orchestrator/orchestrator.py

src/prime_rl/orchestrator/scheduler.py

tests/unit/orchestrator/test_scheduler.py

Signed-off-by: faresobeid <111092724+faresobeid@users.noreply.github.com>

Resolve conflicts in scheduler.py. Adopt from main: - safe_cancel/safe_cancel_all for proper async task cleanup - update_policy_task management inside generate_batch() - stop() method for clean shutdown - maybe_update_policy rename - compute_eval_ckpt_step, prev_ckpt_step tracking in orchestrator Preserve branch behavior: - Individual rollouts via run_rollout (not run_group) - GroupState, groups dict, drop_group for deferred group scoring - _fill_inflight_requests / _schedule_next_request loop Made-with: Cursor

Made-with: Cursor

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

src/prime_rl/orchestrator/scheduler.py

Co-authored-by: faresobeid <faresobeid@users.noreply.github.com>

mikasenghaas

getting there:)

src/prime_rl/orchestrator/orchestrator.py

src/prime_rl/orchestrator/scheduler.py

cursor bot reviewed Feb 24, 2026

View reviewed changes

src/prime_rl/orchestrator/scheduler.py Show resolved Hide resolved

mikasenghaas reviewed Feb 24, 2026

View reviewed changes

src/prime_rl/orchestrator/scheduler.py Outdated Show resolved Hide resolved

src/prime_rl/orchestrator/scheduler.py Outdated Show resolved Hide resolved

faresobeid added 2 commits February 25, 2026 14:41

individual-rollouts

a533977

fixes

f788c40

faresobeid force-pushed the individual-rollouts branch from b33268a to f788c40 Compare February 25, 2026 14:43

faresobeid changed the base branch from tokne-batch to main February 25, 2026 14:43

faresobeid added 3 commits February 25, 2026 19:49

fix group based scoring envs

574f119

ruff

98f61cf

ruff pls

530d627

cursor bot reviewed Feb 25, 2026

View reviewed changes

src/prime_rl/orchestrator/scheduler.py Outdated Show resolved Hide resolved

samsja approved these changes Feb 25, 2026

View reviewed changes

Fix inflight sample metric semantics

e9ad626

Co-authored-by: faresobeid <faresobeid@users.noreply.github.com>

mikasenghaas reviewed Feb 27, 2026

View reviewed changes

faresobeid and others added 4 commits March 2, 2026 01:05

fixes

9c7b5ed

Delete tests/unit/orchestrator/test_scheduler.py

6b6325f

Signed-off-by: faresobeid <111092724+faresobeid@users.noreply.github.com>

fix: off-policy off-by-one - cancel when off_policy_steps >= max

145e58e

Made-with: Cursor

cursor bot reviewed Mar 2, 2026

View reviewed changes

src/prime_rl/orchestrator/scheduler.py Show resolved Hide resolved

Fix off-policy increment race in scheduler

40f63aa

Co-authored-by: faresobeid <faresobeid@users.noreply.github.com>

mikasenghaas reviewed Mar 2, 2026

View reviewed changes

src/prime_rl/orchestrator/orchestrator.py Outdated Show resolved Hide resolved

src/prime_rl/orchestrator/scheduler.py Show resolved Hide resolved

src/prime_rl/orchestrator/scheduler.py Outdated Show resolved Hide resolved

faresobeid added 2 commits March 3, 2026 02:12

adress comments

04b1818

ruff

72d5b87

samsja approved these changes Mar 3, 2026

View reviewed changes

mikasenghaas approved these changes Mar 3, 2026

View reviewed changes

mikasenghaas merged commit bf71516 into main Mar 3, 2026
9 checks passed

hallerite mentioned this pull request Mar 3, 2026

fix: pass rubric to super().init() to avoid empty RubricGroup wrapper PrimeIntellect-ai/research-environments#194

Merged

Conversation

faresobeid commented Feb 24, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mikasenghaas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

samsja commented Feb 25, 2026

Uh oh!

mikasenghaas Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

faresobeid Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mikasenghaas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

faresobeid commented Feb 24, 2026 •

edited by cursor bot

Loading