[DYNAMO] feat: weight_broadcast.keep_recent retention knob for LoRA#2395
Draft
[DYNAMO] feat: weight_broadcast.keep_recent retention knob for LoRA#2395
Conversation
Decouples broadcast-dir retention from `max_async_level`. The previous
code deleted `step_{N - max_async_level - 1}` -- correct for
full-weight broadcasts but wrong for LoRA: vLLM lazy-loads the adapter
inside `_prepare_inputs`, so a generate request admitted under an
older adapter can page that dir in *after* the orchestrator has moved
on. The trainer can race ahead and `rmtree` the adapter directory
just before the inference worker tries to read it.
Adds `weight_broadcast.keep_recent: int | None` (default `None`,
falls back to `max_async_level` so existing behavior is preserved).
For LoRA + filesystem broadcast, set to `max_async_level + 1` or `+2`.
A `model_validator` rejects `keep_recent < max_async_level` with a
message that names the failure mode.
`maybe_clean`'s int parameter is renamed `async_level` -> `retention`
to reflect that it can now come from either knob.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an optional
weight_broadcast.keep_recentknob that decouples broadcast-dir retention frommax_async_level. Default behavior is unchanged (when unset, retention falls back tomax_async_levelexactly as before).Problem
maybe_cleanpreviously deletedstep_{N - max_async_level - 1}. That's the right bound for full-weight broadcasts, but not for LoRA. vLLM lazy-loads the adapter inside_prepare_inputs, so a generate request admitted under an older adapter can page that adapter dir in after the orchestrator has moved on. The trainer can race ahead andrmtreethe adapter directory just before the inference worker tries to read it.Fix
New optional
weight_broadcast.keep_recent: int | None(defaultNone):None→ fall back tomax_async_level(no behavior change for existing users).max_async_level + 1or+2for LoRA + filesystem broadcast.A
model_validatorrejectskeep_recent < max_async_levelwith a message that names the failure mode (LoRAAdapterNotFoundError), since a smaller-than-staleness retention window is always a misconfiguration.maybe_clean's int parameter is renamedasync_level->retentionto reflect that it can now come from either knob.Relationship to other PRs
A companion PR (#2391) addresses a different LoRA failure mode on the same code path — an NFS visibility race where the
STABLEsentinel becomes visible before the adapter files it points to. The two are independent and can land in either order. Together they cover both classes ofLoRAAdapterNotFoundErrorwe've seen on shared-filesystem deployments.