Skip to content

Remove unnecessary fixed sleep by adding predicate-based path check#700

Merged
tchaton merged 8 commits into
Lightning-AI:mainfrom
Red-Eyed:subsample_streaming_dataset_speedup
Sep 4, 2025
Merged

Remove unnecessary fixed sleep by adding predicate-based path check#700
tchaton merged 8 commits into
Lightning-AI:mainfrom
Red-Eyed:subsample_streaming_dataset_speedup

Conversation

@Red-Eyed
Copy link
Copy Markdown
Contributor

@Red-Eyed Red-Eyed commented Sep 1, 2025

What does this PR do?

Replaces the fixed time.sleep(0.5) with a predicate-based check using wait_for_predicate.
This removes unnecessary waiting when the dataset path already exists, while still handling cases where it appears slightly later.
This significantly speeds up cache loading when there are thousands of optimized dirs

@codecov
Copy link
Copy Markdown

codecov Bot commented Sep 2, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84%. Comparing base (8ab1975) to head (ef539da).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@         Coverage Diff         @@
##           main   #700   +/-   ##
===================================
  Coverage    84%    84%           
===================================
  Files        52     52           
  Lines      7095   7103    +8     
===================================
+ Hits       5970   5986   +16     
+ Misses     1125   1117    -8     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bhimrazy bhimrazy requested a review from Copilot September 2, 2025 07:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves performance when loading datasets by replacing a fixed 0.5-second sleep with a predicate-based check that only waits when necessary. This optimization significantly speeds up cache loading scenarios with thousands of optimized directories.

  • Introduces a wait_for_predicate utility function for timeout-based conditional waiting
  • Replaces the fixed sleep with a conditional check that returns immediately if the path exists
  • Maintains the same 0.5-second timeout for cases where the path doesn't exist initially

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread src/litdata/utilities/dataset_utilities.py
Comment thread src/litdata/utilities/dataset_utilities.py
Red-Eyed and others added 2 commits September 2, 2025 10:16
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@tchaton tchaton merged commit 76d3bee into Lightning-AI:main Sep 4, 2025
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants