Skip to content

individual rollouts#1865

Merged
mikasenghaas merged 13 commits intomainfrom
individual-rollouts
Mar 3, 2026
Merged

individual rollouts#1865
mikasenghaas merged 13 commits intomainfrom
individual-rollouts

Conversation

@faresobeid
Copy link
Contributor

@faresobeid faresobeid commented Feb 24, 2026

Re-write the scheduler to do rollouts at the rollout level instead of group level. This way any long tails within groups are handled, so when a rollout within a group finishes it can move on to another group.
Currently if any env needs group scoring, we go back to previous behaviour of group rollouts although ideally we make it so verifiers is closer here and we can do scoring within prime-rl


Note

Medium Risk
Rewrites the training rollout scheduler and changes verification/scoring configuration, which can affect throughput, reward computation, and cancellation behavior during training.

Overview
Refactors rollout scheduling from group-level to individual rollout tasks to reduce long-tail stalls: the scheduler now tracks per-example “groups” internally while keeping max_inflight_rollouts as the primary concurrency limit, and updates batch inflight/off-policy metrics accordingly.

Adds deferred group scoring support for tasks whose rubrics require group-level reward functions: training envs run with score_rollouts=False and the scheduler calls rubric.score_group() once all rollouts for an example complete.

Replaces orchestrator.buffer.skip_verification with a new top-level orchestrator.verification.enabled switch (with validation preventing reward-dependent buffer features when disabled), updates docs/changelog, and adds a unit test covering off-policy update behavior during interleaved group cancellation.

Written by Cursor Bugbot for commit 72d5b87. This will update automatically on new commits. Configure here.

Copy link
Member

@mikasenghaas mikasenghaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, yea lets do some testing on this to verify that we dont have any async race conditions anywhere but directionally looks good.

i think mid-term we want to move away from the verifiers env group for training envs and make our abstractions "multi-env" aware by default, e.g. smth like having a buffer and scheduler per env goverened a "scheduling" component on top or smth bc i think we will want more and more fine-grained control over how each env behaves (e.g. here whether or not to use gruop scheduling) and its always awkward to code this in an abstraction that handles multiple cases where you need conditional everywhere

@faresobeid faresobeid changed the base branch from tokne-batch to main February 25, 2026 14:43
@samsja
Copy link
Member

samsja commented Feb 25, 2026

lets wait for @mikasenghaas review before merging

Co-authored-by: faresobeid <faresobeid@users.noreply.github.com>
raise ValueError(
"max_inflight_rollouts conflicts with oversampling_factor * batch_size"
)
raise ValueError("max_inflight_rollouts conflicts with oversampling_factor * batch_size")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo, could just deprectea oversampling_factor, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree but @samsja mentioned to keep for backwards compatibility (although im hating this lol)

faresobeid and others added 4 commits March 2, 2026 01:05
Signed-off-by: faresobeid <111092724+faresobeid@users.noreply.github.com>
Resolve conflicts in scheduler.py. Adopt from main:
- safe_cancel/safe_cancel_all for proper async task cleanup
- update_policy_task management inside generate_batch()
- stop() method for clean shutdown
- maybe_update_policy rename
- compute_eval_ckpt_step, prev_ckpt_step tracking in orchestrator

Preserve branch behavior:
- Individual rollouts via run_rollout (not run_group)
- GroupState, groups dict, drop_group for deferred group scoring
- _fill_inflight_requests / _schedule_next_request loop

Made-with: Cursor
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Co-authored-by: faresobeid <faresobeid@users.noreply.github.com>
Copy link
Member

@mikasenghaas mikasenghaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getting there:)

@mikasenghaas mikasenghaas merged commit bf71516 into main Mar 3, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants