Skip to content

[DYNAMO] feat: weight_broadcast.keep_recent retention knob for LoRA#2395

Draft
AmeenP wants to merge 1 commit intomainfrom
feat/lora-broadcast-keep-recent
Draft

[DYNAMO] feat: weight_broadcast.keep_recent retention knob for LoRA#2395
AmeenP wants to merge 1 commit intomainfrom
feat/lora-broadcast-keep-recent

Conversation

@AmeenP
Copy link
Copy Markdown
Contributor

@AmeenP AmeenP commented May 2, 2026

Summary

Adds an optional weight_broadcast.keep_recent knob that decouples broadcast-dir retention from max_async_level. Default behavior is unchanged (when unset, retention falls back to max_async_level exactly as before).

Problem

maybe_clean previously deleted step_{N - max_async_level - 1}. That's the right bound for full-weight broadcasts, but not for LoRA. vLLM lazy-loads the adapter inside _prepare_inputs, so a generate request admitted under an older adapter can page that adapter dir in after the orchestrator has moved on. The trainer can race ahead and rmtree the adapter directory just before the inference worker tries to read it.

Fix

New optional weight_broadcast.keep_recent: int | None (default None):

  • None → fall back to max_async_level (no behavior change for existing users).
  • Set to max_async_level + 1 or +2 for LoRA + filesystem broadcast.

A model_validator rejects keep_recent < max_async_level with a message that names the failure mode (LoRAAdapterNotFoundError), since a smaller-than-staleness retention window is always a misconfiguration.

maybe_clean's int parameter is renamed async_level -> retention to reflect that it can now come from either knob.

Relationship to other PRs

A companion PR (#2391) addresses a different LoRA failure mode on the same code path — an NFS visibility race where the STABLE sentinel becomes visible before the adapter files it points to. The two are independent and can land in either order. Together they cover both classes of LoRAAdapterNotFoundError we've seen on shared-filesystem deployments.

Decouples broadcast-dir retention from `max_async_level`. The previous
code deleted `step_{N - max_async_level - 1}` -- correct for
full-weight broadcasts but wrong for LoRA: vLLM lazy-loads the adapter
inside `_prepare_inputs`, so a generate request admitted under an
older adapter can page that dir in *after* the orchestrator has moved
on. The trainer can race ahead and `rmtree` the adapter directory
just before the inference worker tries to read it.

Adds `weight_broadcast.keep_recent: int | None` (default `None`,
falls back to `max_async_level` so existing behavior is preserved).
For LoRA + filesystem broadcast, set to `max_async_level + 1` or `+2`.
A `model_validator` rejects `keep_recent < max_async_level` with a
message that names the failure mode.

`maybe_clean`'s int parameter is renamed `async_level` -> `retention`
to reflect that it can now come from either knob.
@AmeenP AmeenP changed the title feat: weight_broadcast.keep_recent retention knob for LoRA [DYNAMO] feat: weight_broadcast.keep_recent retention knob for LoRA May 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant