Skip to content

Lhotse: add prefetch_factor option to LhotseDataLoadingConfig#15665

Open
XuesongYang wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
XuesongYang:xueyang/pr-prefetch-factor
Open

Lhotse: add prefetch_factor option to LhotseDataLoadingConfig#15665
XuesongYang wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
XuesongYang:xueyang/pr-prefetch-factor

Conversation

@XuesongYang
Copy link
Copy Markdown
Collaborator

Add configurable prefetch_factor for PyTorch DataLoader, allowing users to increase the per-worker prefetch buffer depth to absorb I/O latency spikes from network filesystems. Applies to both single-config and multi-config dataloader paths.

When unset (None), PyTorch's default of 2 is used, preserving existing behavior.

Usage: model.train_ds.prefetch_factor=4

Add configurable prefetch_factor for PyTorch DataLoader, allowing users
to increase the per-worker prefetch buffer depth to absorb I/O latency
spikes from network filesystems.  Applies to both
single-config and multi-config dataloader paths.

When unset (None), PyTorch's default of 2 is used, preserving existing
behavior.

Usage: model.train_ds.prefetch_factor=4
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 5, 2026 21:23
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 5, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new prefetch_factor knob to NeMo’s Lhotse-backed dataloader configuration so users can tune PyTorch DataLoader per-worker prefetch depth to better tolerate I/O latency spikes (e.g., on network filesystems), while keeping existing behavior when unset.

Changes:

  • Introduced prefetch_factor: int | None = None in LhotseDataLoadingConfig.
  • Passed prefetch_factor through to torch.utils.data.DataLoader when num_workers > 0 in both single-config and multi-config dataloader creation paths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +508 to +509
if shared_opts.num_workers > 0 and shared_opts.get("prefetch_factor") is not None:
dloader_kwargs["prefetch_factor"] = shared_opts.prefetch_factor
pin_memory=config.pin_memory,
)
if config.num_workers > 0 and config.get("prefetch_factor") is not None:
dloader_kwargs["prefetch_factor"] = config.prefetch_factor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants