Skip to content

[DYNAMO] feat: example k8s manifests + local smoke tests#2394

Draft
AmeenP wants to merge 2 commits intomainfrom
feat/dynamo-deployment-example
Draft

[DYNAMO] feat: example k8s manifests + local smoke tests#2394
AmeenP wants to merge 2 commits intomainfrom
feat/dynamo-deployment-example

Conversation

@AmeenP
Copy link
Copy Markdown
Contributor

@AmeenP AmeenP commented May 2, 2026

Summary

Adds a worked example for running prime-rl with NVIDIA Dynamo as the inference backend instead of prime-rl's bundled vLLM frontend. The PR is self-contained: it only adds files under k8s/dynamo-deploy/ and tools/dynamo/, plus a formatting fix for the new local admin stub.

k8s/dynamo-deploy/ cluster deployment example

File Purpose
dynamo-dgd.yaml Example DynamoGraphDeployment with frontend + vLLM worker. Requires DYN_ENABLE_RL=true so /v1/rl/* admin endpoints are served natively.
prime-rl-values.yaml Helm values overlay that disables prime-rl's own inference component and points base_url / admin_base_url at Dynamo.
prime-rl-configs.yaml ConfigMap mounted at /configs in orchestrator and trainer pods.
admin-stub.yaml Optional admin-stub Deployment + Service for older Dynamo builds without native /v1/rl/*.

tools/dynamo/ local smoke flow

File Purpose
admin_stub.py Local-dev fallback admin stub using aiohttp.
configs/smoke_rl.toml / smoke_rl_long.toml Orchestrator configs pointed at local Dynamo on localhost:8000.
configs/smoke_trainer.toml / smoke_trainer_long.toml Matching trainer configs.
run_dynamo.sh Launches Dynamo on GPU 0.
run_smoke_test.sh Launches orchestrator + trainer on GPU 1, supports --long.

Notes

  • Depends on the Helm chart additions in [DYNAMO] feat(helm): tolerations, imagePullSecrets, configMap mount #2393: <component>.configMap, imagePullSecrets, and tolerations.
  • The orchestrator/trainer source code remains backend-agnostic here; it talks to whichever endpoint client.base_url and client.admin_base_url point at.
  • All manifests use placeholders such as <your-namespace> and <your-registry>/...; no secrets, paths, or registry coordinates are baked in.

What's not in this PR

Extracted from the same upstream branch as #2391 and #2393. Skipped:

Latest Validation

After rebasing onto latest prime-rl/main, formatted tools/dynamo/admin_stub.py on this base branch so #2394 no longer depends on the follow-up PR for Ruff hygiene.

  • uvx ruff==0.13.0 check tools/dynamo/admin_stub.py
  • uvx ruff==0.13.0 format --check tools/dynamo/admin_stub.py
  • python -m py_compile tools/dynamo/admin_stub.py
  • bash -n tools/dynamo/run_dynamo.sh tools/dynamo/run_smoke_test.sh tools/dynamo/run_full_smoke.sh

@AmeenP AmeenP changed the title feat(dynamo): example k8s manifests + local smoke tests for Dynamo backend [DYNAMO] feat: example k8s manifests + local smoke tests May 2, 2026
… inference backend

Adds a worked example for running prime-rl with NVIDIA Dynamo as the
inference backend instead of prime-rl's bundled vLLM frontend.

Self-contained, additive-only — touches no existing source code, so
zero risk to existing deployments.

k8s/dynamo-deploy/:
  - dynamo-dgd.yaml: Example DynamoGraphDeployment (frontend + vLLM
    worker). Requires DYN_ENABLE_RL=true on the Dynamo runtime so
    /v1/rl/* admin endpoints are served natively.
  - prime-rl-values.yaml: Helm values overlay for k8s/prime-rl that
    disables prime-rl's own inference component and points
    base_url/admin_base_url at the Dynamo frontend.
  - prime-rl-configs.yaml: ConfigMap mounted at /configs in the
    orchestrator and trainer pods (used together with the
    `<component>.configMap` Helm value -- see #2393).
  - admin-stub.yaml: Optional admin-stub Deployment + Service for
    older Dynamo builds that don't serve /v1/rl/* natively.

tools/dynamo/:
  - admin_stub.py: Local-dev fallback admin stub (aiohttp). Mirrors
    the optional k8s admin-stub for laptop/single-node runs.
  - configs/smoke_*.toml: Short (5-step) and long (20-step) RL +
    trainer configs pointed at a local Dynamo on localhost:8000.
  - run_dynamo.sh / run_smoke_test.sh: Convenience launchers for
    a 2-GPU smoke flow (GPU 0 = Dynamo, GPU 1 = trainer).

All manifests use placeholders for namespace, image, and image-pull
secret -- no secrets, paths, or registry coordinates are baked in.
@AmeenP AmeenP force-pushed the feat/dynamo-deployment-example branch from 06b03e3 to 8f1aafe Compare May 8, 2026 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant