Skip to content

Add DP-master node affinity for Ray strict/fill pack strategy.#916

Open
ffrujeri wants to merge 1 commit intomainfrom
ffrujeri/local-vllm-dp-patch
Open

Add DP-master node affinity for Ray strict/fill pack strategy.#916
ffrujeri wants to merge 1 commit intomainfrom
ffrujeri/local-vllm-dp-patch

Conversation

@ffrujeri
Copy link
Contributor

@ffrujeri ffrujeri commented Mar 19, 2026

What does this PR do?

Pins extra Ray placement groups for local_vllm_model data-parallel ranks to the DP master (via node:<dp_master_ip> bundle hints) when VLLM_RAY_DP_PACK_STRATEGY is strict or fill and the master has enough available GPUs—matching upstream vLLM behavior so multi-DP deployments colocate on one node when capacity allows.

Issues

Usage

  • strict / fill: No new user-facing API. Keep using local_vllm_model with vllm_serve_env_vars and VLLM_RAY_DP_PACK_STRATEGY: fill or strict. When the DP master node has enough GPUs for all non–rank-0 groups (world_size * (data_parallel_size - 1) after rank 0), extra PGs are scheduled with the same node-affinity hint as upstream vLLM.
# Example: safety judge (or any local_vllm_model) with multi-DP on one node when it fits
safety_judge_model:
  _target_: responses_api_models.local_vllm_model.app.LocalVLLMModel
  # ...
  vllm_serve_kwargs:
    tensor_parallel_size: 1
    pipeline_parallel_size: 1
    data_parallel_size: 4
    data_parallel_size_local: 1
    # ... other serve args
  vllm_serve_env_vars:
    VLLM_RAY_DP_PACK_STRATEGY: fill   # or strict

Verify colocation with your usual tooling (e.g. Ray dashboard or python scripts/visualize_ray_placement_groups.py).

Additional Information

  • Problem: Ray STRICT_PACK / PACK only packs within a single placement group. NeMo Gym’s patch creates one PG per DP rank; without cross-PG hints, ranks could spread across nodes even with fill/strict, diverging from issue #914 expectations and upstream vLLM’s node:<ip> bundle hints.

Before this PR we would see:

safety_judge_model  (4 unique PGs, 4 total entries)
    - safety_judge_model_dp_rank_0  state=CREATED  GPU=1  id=9401191b...  nodes 1 (1 GPU)
    - safety_judge_model_dp_rank_3  state=CREATED  GPU=1  id=ba090962...  nodes 2(1 GPU)
    - safety_judge_model_dp_rank_2  state=CREATED  GPU=1  id=bfdd245c...  nodes 2(1 GPU)
    - safety_judge_model_dp_rank_1  state=CREATED  GPU=1  id=d04e063f...  nodes 1 (1 GPU)
    -> total GPU (sum over entries): 4

After:

safety_judge_model  (4 unique PGs, 4 total entries)
    - safety_judge_model_dp_rank_0  state=CREATED  GPU=1  id=9401191b...  nodes 1 (1 GPU)
    - safety_judge_model_dp_rank_3  state=CREATED  GPU=1  id=ba090962...  nodes 1(1 GPU)
    - safety_judge_model_dp_rank_2  state=CREATED  GPU=1  id=bfdd245c...  nodes 1(1 GPU)
    - safety_judge_model_dp_rank_1  state=CREATED  GPU=1  id=d04e063f...  nodes 1 (1 GPU)
    -> total GPU (sum over entries): 4
  • Change: For strict/fill and dp_size > 1, if the DP master’s available GPU count ≥ world_size * (dp_size - 1), extra PGs use {device_str: 1.0, "node:" + dp_master_ip: 0.001} on each GPU bundle; otherwise affinity is left unset and a log line explains why pinning was skipped.

  • Scope: responses_api_models/local_vllm_model/app.py_patch_create_dp_placement_groups; head/rank-0 PG and existing colocated-PG resource filtering are unchanged.

  • Docs: If local_vllm_model/README.md already describes strict/fill and colocation, consider a one-line note that pinning now matches upstream when the master has capacity (optional follow-up).

Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 19, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fill/strict does not colocate multi-DP local_vllm_model ranks (missing node-affinity on extra PGs)

1 participant