Skip to content

[DYNAMO] smoke runner follow-up from tested branch#2445

Draft
AmeenP wants to merge 2 commits intofeat/dynamo-deployment-examplefrom
codex/dynamo-smoke-runner-followup
Draft

[DYNAMO] smoke runner follow-up from tested branch#2445
AmeenP wants to merge 2 commits intofeat/dynamo-deployment-examplefrom
codex/dynamo-smoke-runner-followup

Conversation

@AmeenP
Copy link
Copy Markdown
Contributor

@AmeenP AmeenP commented May 8, 2026

Summary

Ports the missing smoke-test runner fixes from Biswa's tested biswapanda/prime-rl@bis/prime-rl-merged branch on top of #2394 (feat/dynamo-deployment-example).

  • adds tools/dynamo/run_full_smoke.sh for the orchestrator + trainer smoke flow
  • updates the smoke runner for a single-GPU colocated setup
  • caps run_dynamo.sh vLLM worker memory via GPU_MEM_UTIL defaulting to 0.45

Context

This keeps the tested local Dynamo smoke fixes reviewable separately from the original deployment example PR. The branch has been rebased onto latest prime-rl/main via its parent #2394; the admin-stub formatting fix now lives in #2394 itself.

Validation

  • uvx ruff==0.13.0 check tools/dynamo/admin_stub.py
  • uvx ruff==0.13.0 format --check tools/dynamo/admin_stub.py
  • python -m py_compile tools/dynamo/admin_stub.py
  • bash -n tools/dynamo/run_dynamo.sh tools/dynamo/run_smoke_test.sh tools/dynamo/run_full_smoke.sh

Full pytest is left to Linux CI for this stack because the local checkout is macOS while the lockfile targets Linux environments.

@AmeenP AmeenP force-pushed the codex/dynamo-smoke-runner-followup branch from 0b4323d to 6e78613 Compare May 8, 2026 10:03
@AmeenP AmeenP force-pushed the feat/dynamo-deployment-example branch from 06b03e3 to 8f1aafe Compare May 8, 2026 10:03
biswapanda added 2 commits May 8, 2026 03:05
Combines the existing run_dynamo.sh (GPU 0) and run_smoke_test.sh (GPU 1)
into a single one-shot launcher. Recovered from the bis/dynamo-integration
branch as a convenience wrapper for the local smoke flow documented in
current-plan.md §8.1.

Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
This machine has 1× NVIDIA RTX PRO 6000 Blackwell (96 GB), not 2 GPUs.
Both Dynamo inference and the prime-rl trainer must share GPU 0.

Changes:
* run_full_smoke.sh: trainer CUDA_VISIBLE_DEVICES 1 -> 0; add nvidia-smi
  and Dynamo /health preflight checks; tighten with set -euo pipefail.
* run_dynamo.sh: pass --gpu-memory-utilization 0.45 to the vLLM worker
  by default so the trainer has ~50 GB to load FSDP-sharded weights +
  optimizer state. Override with GPU_MEM_UTIL env var if needed.

Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
@AmeenP AmeenP force-pushed the codex/dynamo-smoke-runner-followup branch from 6e78613 to 500a5b1 Compare May 8, 2026 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants