Skip to content

feat: io_uring based file reader#5777

Merged
westonpace merged 13 commits intolance-format:mainfrom
westonpace:feat/io-uring
Mar 31, 2026
Merged

feat: io_uring based file reader#5777
westonpace merged 13 commits intolance-format:mainfrom
westonpace:feat/io-uring

Conversation

@westonpace
Copy link
Copy Markdown
Member

@westonpace westonpace commented Jan 21, 2026

This is still a draft while waiting on #5755 and #5773

This PR adds a new URI scheme file+uring. The scheme uses the same local file reader as file but has two custom Reader implementations that are based on the io_uring API. One of these creates a configurable number of process-wide ring threads and the reader communicates with this thread using a queue. The second assumes that the scheduler and decoder run on the same thread and uses a thread local uring instance.

Both are able to saturate up to 1.5M IOPS/s when combined with the scheduler rework. I've tested the thread local variant up to 2M IOPS/s. These numbers are assuming the data is not in the kernel page cache. I've seen results as high as 4M IOPS/s when the data is cached.

@github-actions github-actions bot added the enhancement New feature or request label Jan 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR Review: feat: io_uring based file reader

Summary

This PR adds a new file+uring URI scheme implementing io_uring-based I/O for high-performance local file access on Linux. The implementation includes two variants: a process-wide thread pool approach and a thread-local variant for current-thread runtimes.


P0 Issues (Must Fix)

1. Short Read Handling Missing (thread.rs:2263-2270)

When io_uring completes a read, the result value indicates the number of bytes actually read, which can be less than requested (short read). The current code only checks if result < 0 for errors but doesn't handle the case where result < request.length. This can lead to returning partially-filled buffers with uninitialized memory.

if result < 0 {
    state.err = Some(io::Error::from_raw_os_error(-result));
} else if request.length > 0 {
    // Should check: result as usize != request.length

The same issue exists in current_thread.rs:907-909.

2. Panic in submit_read on push_request Failure (current_thread.rs:1065)

push_request(request.clone()).unwrap();

This will panic if the submission queue is full (WouldBlock error). This should return an error future instead of panicking, similar to how UringReader::submit_read handles the channel send failure.


P1 Issues (Should Fix)

3. File Handle Cache May Serve Stale Data (reader.rs:1416-1422)

The file handle cache uses a 60-second TTL with 10K max capacity. If a file is modified or replaced after being cached, readers will continue using the old file descriptor. Consider:

  • Documenting this behavior clearly
  • Or checking file inode/mtime on cache hit

4. Busy-Spin in Current-Thread Future (current_thread_future.rs:1218-1219)

cx.waker().wake_by_ref();
Poll::Pending

When the request hasn't completed, this immediately wakes itself, causing a busy spin that will consume 100% CPU until completion. Consider yielding back to the executor properly.

5. Reader Trait Lifetime Change Breaking Change (traits.rs)

The Reader trait changed from async fn to returning BoxFuture. While the PR description mentions this depends on other PRs, this is a breaking change to a public trait that may affect downstream consumers.


Minor Observations

  • Good test coverage for basic functionality
  • Environment variable configuration is well-documented in the module docs
  • The thread pool approach with round-robin selection is reasonable for load balancing
  • The 100ms logging interval in the thread loop may be too frequent for production use; consider making it configurable or debug-level only

Copy link
Copy Markdown
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I have a couple minor non-blocking questions, but feel free to merge as is.

Comment on lines +55 to +57
/// Global cache of open file handles.
/// Entries expire after 60 seconds to ensure files are eventually closed.
pub(super) static HANDLE_CACHE: LazyLock<moka::future::Cache<CacheKey, CachedReaderData>> =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: this is going to be very useful!

In another PR, I'm separating the data from file handles in the index cache (so that entries are serializable). Having a separate cache to make re-opening file handles might be useful.

westonpace and others added 12 commits March 31, 2026 05:28
The completion handlers previously ignored the positive return value
from cqe.result(), which is the number of bytes actually read. On a
short read (e.g. signal interruption), the full pre-allocated buffer
was returned with uninitialized tail bytes. Reading past EOF returned
garbage data instead of an error.

Now partial reads accumulate bytes and resubmit for the remainder,
while EOF (result == 0) before the full read completes returns an
UnexpectedEof error.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When push_to_sq failed (submission queue full), the request was silently
dropped — no error was set, completed was never marked true, and no
waker was called. The future would hang forever waiting for a completion
that could never arrive.

Add IoRequest::fail() to centralize the "complete with error" pattern,
and call it from push_to_sq on both failure paths (SQ full and push
error). Also fix the current_thread retry resubmission to fail the
request and fix pending_count bookkeeping on retry failure.

Add unit tests that create a tiny IoUring (queue_depth=2), fill the SQ,
and verify the 3rd request's future returns an error within a 2-second
timeout instead of hanging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The field was never read, only written. Its bookkeeping was also
incorrect on the short-read retry path. Remove it entirely rather
than fixing dead code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
submit_read previously called push_request().unwrap(), which panics
if the submission queue is full. Return an error future instead,
matching the pattern used by the thread-based UringReader.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace disallowed snafu::location!() macro with Error::not_found() in
uring reader modules.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use LazyCell for thread-local io_uring initialization (panics on
  failure instead of propagating errors, per reviewer suggestion)
- Cache LANCE_URING_BLOCK_SIZE env var in a LazyLock static
- Use str_is_truthy for LANCE_URING_CURRENT_THREAD parsing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Box deep async sub-futures in build() to break the type-level recursion
chain that exceeded the default recursion_limit of 128. The io_uring
wrappers added in this PR made I/O futures slightly larger, pushing the
compiler's query depth from ~128 to 130 when computing the async state
machine layout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@westonpace westonpace merged commit 88d8faf into lance-format:main Mar 31, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants