[DYNAMO] refactor: extract admin endpoints into AdminAPI Protocol#2397
Draft
[DYNAMO] refactor: extract admin endpoints into AdminAPI Protocol#2397
Conversation
Pure refactor, no behavior change. Sets up the foundation for adding a DynamoAdminAPI in a follow-up without touching every call site. Adds: - AdminAPI Protocol describing 7 admin operations (health, list_models, pause, resume, update_weights, load_lora_adapter, init_broadcaster). - VLLMAdminAPI class implementing it with the *exact* paths, params, timeouts, and raise-for-status behavior the previous inline code had. - Module-level singleton _DEFAULT_ADMIN: AdminAPI = VLLMAdminAPI(). Each free admin function (update_weights, load_lora_adapter, _pause_engines, _resume_engines, init_nccl_broadcast, check_health, maybe_check_has_model) now takes a keyword-only `admin: AdminAPI = _DEFAULT_ADMIN` and delegates HTTP construction to it. Existing call sites in orchestrator.py, scheduler.py, and elastic.py are unchanged -- they pick up VLLMAdminAPI behavior via the default. The retry / health-poll / NCCL-404-tolerance logic stays in the free functions where it always lived. AdminAPI is responsible only for constructing the per-call HTTP request. Why this shape: The previous design assumed every admin endpoint shares the same path prefix (admin_base_url) and bare relative paths. That works for vLLM because vLLM aliases /load_lora_adapter and /v1/load_lora_adapter, masking inconsistencies. It does not work when a backend (e.g. Dynamo's /v1/rl/*) splits some endpoints out of the shared prefix or uses different request shapes (token_ids via nvext, native cache_salt, model listing not under the admin path). Encoding paths in TOML would force every deployment to spell out 6+ routes and wouldn't help with shape differences. A backend-typed Protocol localises both concerns. Existing test (tests/unit/utils/test_client.py) is unchanged: the same mock_client.post call is made via VLLMAdminAPI.load_lora_adapter with the same args. Local pytest was unrunnable due to an unrelated uv.lock parse issue (exclude-newer = "7 days") -- verified by tracing the call graph manually.
5b085a8 to
d21a4ad
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Pure refactor with no behavior change. Extracts admin-endpoint construction into a typed
AdminAPIProtocol so aDynamoAdminAPIfollow-up can drop in without touching every call site or scatteringif backend == "dynamo"branches acrossclient.py.What changes
AdminAPIProtocol — 7 admin operations (health,list_models,pause,resume,update_weights,load_lora_adapter,init_broadcaster).VLLMAdminAPI— single implementation, byte-identical paths/params/timeouts/raise-for-status to the previous inline code._DEFAULT_ADMIN: AdminAPI = VLLMAdminAPI()— module singleton so existing call sites need zero changes.admin: AdminAPI = _DEFAULT_ADMINand delegates HTTP construction to it.Inlined two functions while I was here:
_pause_engines/_resume_engines— were one-linegather()wrappers aroundadmin.pause/admin.resume, with a single caller. Inlined intoupdate_weights. Net: -12 lines, callsite gets clearer (the pause→try→resume pattern is visible inline).What does not change
orchestrator.py,scheduler.py,elastic.pyare untouched — they pick upVLLMAdminAPIbehavior via the default arg.tests/unit/utils/test_client.pyis untouched —mock_client.postis called with the same args, just routed throughVLLMAdminAPI.load_lora_adapter.Why this shape (and not alternatives)
if backend == "dynamo"inside each free functionadmin: AdminAPI. Implicit inheritance also creates the risk of accidentally inheriting vLLM-specific behavior.The Protocol earns its keep now, not hypothetically: there are already two backends (vLLM, Dynamo) with materially different surfaces, and the existing
use_token_client,skip_model_check, and bis-branch'suse_prefix_cache_saltflags are all the same shape ("backend has different conventions, papered over by a config flag").Follow-ups (separate PRs)
client.backend: Literal["vllm", "dynamo"] = "vllm"config field; pick the rightAdminAPIinsetup_admin_clients.DynamoAdminAPIwith the correct paths for Dynamo's admin frontend (some under/v1/rl/, some not).Each is independently shippable. This PR is the foundation.
Verification
ruff format+ruff checkclean.test_load_lora_adapter_succeeds_on_first_attempt: samemock_client.post("/load_lora_adapter", json=..., timeout=...)call viaVLLMAdminAPI.load_lora_adapter.uv run pytestis unrunnable due to an unrelateduv.lockparse issue (exclude-newer = "7 days"from chore: add 7-day supply chain cooldown via exclude-newer #2382 + an older uv binary). CI please verify.Stat
+156 / -75(one file). The ~30 lines of net growth are the Protocol contract — exactly what theDynamoAdminAPIfollow-up will need.