Skip to content

refactor: change reader's get_range result to be a static future#5755

Merged
westonpace merged 2 commits intolance-format:mainfrom
westonpace:refactor/static-get-range
Jan 22, 2026
Merged

refactor: change reader's get_range result to be a static future#5755
westonpace merged 2 commits intolance-format:mainfrom
westonpace:refactor/static-get-range

Conversation

@westonpace
Copy link
Copy Markdown
Member

This is the first in a set of PRs to add both a new scheduler implementation (a lightweight task-free scheduler) and a new reader implementation (that utilizes I/O uring).

The need for this isn't terribly obvious. Instead of...

let fut = reader.get_range(...)

it would seem we could just do...

let fut = async move { reader.get_range(...) };

...and we will have a static future. Unfortunately, the latter approach changes the semantics somewhat. Instead of calling get_range immediately, it does not call get_range until the future is first polled. This is a problem for the I/O uring reader which needs to immediately push the request into the submission queue (and not wait until polled). Perhaps there is a different solution than this PR but this PR is also not too onerous and does work.


This PR was partially created with Claude Opus 4.5. I have reviewed all suggested changes and take full responsibility for them.

@github-actions
Copy link
Copy Markdown
Contributor

Code Review: PR #5755 - Static Future for get_range

Overview

This PR refactors the Reader trait to return BoxFuture<'static, ...> instead of using async fn, enabling I/O uring implementations that need to submit requests immediately rather than on first poll.

Issues to Address

P0 - Potential Bug: SmallReader internal trait not used by outer struct

In object_reader.rs, the wait method is moved to SmallReaderInner, but SmallReader::get_range and get_all call self.inner.wait() which works. However, I noticed the StaticGetRange trait is defined but only used internally by GetRequest. This is fine but worth noting the trait is private and scoped appropriately.

P1 - Missing Clone requirement change

The do_with_retry function now requires Clone on the closure (f: impl Fn() -> BoxFuture<'a, OSResult<O>> + Clone). This is necessary because the closure is cloned in each iteration of the loop. However, this changes the API contract - callers must now pass clonable closures. Please verify all call sites satisfy this requirement.

P1 - Unused import

The async_trait import remains in traits.rs (line 6) but is no longer used for the Reader trait. Consider removing it if no other traits in the file need it (looks like Writer and WriteExt still use it, so this is fine).

Observations (non-blocking)

  1. SmallReader now derives Clone - The addition of #[derive(Clone)] on SmallReader and the restructuring to SmallReaderInner is a reasonable pattern for enabling the static future requirement.

  2. GetRequest wrapper - The GetRequest struct with StaticGetRange trait is a clean abstraction for capturing the object store and path together for static futures.

  3. Lifetime changes are correct - get_range returns BoxFuture<'static, ...> while size and get_all return BoxFuture<'_, ...>. The 'static on get_range is the key change enabling I/O uring, and it's correctly implemented by cloning/moving necessary data into the future.

Test Coverage

The existing tests should cover basic functionality, but consider adding a test that verifies the immediate submission behavior (e.g., ensuring get_range starts work before being polled) to document the intended semantics for future maintainers.


Overall, this is a well-structured refactor with clear motivation. The changes are mechanical but necessary for the I/O uring work mentioned in the PR description.

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 20, 2026

Codecov Report

❌ Patch coverage is 76.08696% with 33 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-io/src/object_reader.rs 78.02% 18 Missing and 2 partials ⚠️
rust/lance-io/src/local.rs 72.34% 6 Missing and 7 partials ⚠️

📢 Thoughts on this report? Let us know!

@westonpace westonpace merged commit 4b7c663 into lance-format:main Jan 22, 2026
31 checks passed
majin1102 pushed a commit to majin1102/lance that referenced this pull request Jan 23, 2026
…ce-format#5755)

This is the first in a set of PRs to add both a new scheduler
implementation (a lightweight task-free scheduler) and a new reader
implementation (that utilizes I/O uring).

The need for this isn't terribly obvious.  Instead of...

```
let fut = reader.get_range(...)
```

it would seem we could just do...

```
let fut = async move { reader.get_range(...) };
```

...and we will have a static future. Unfortunately, the latter approach
changes the semantics somewhat. Instead of calling get_range
immediately, it does not call get_range until the future is first
polled. This is a problem for the I/O uring reader which needs to
immediately push the request into the submission queue (and not wait
until polled). Perhaps there is a different solution than this PR but
this PR is also not too onerous and does work.

---

**This PR was partially created with Claude Opus 4.5. I have reviewed
all suggested changes and take full responsibility for them.**
vivek-bharathan pushed a commit to vivek-bharathan/lance that referenced this pull request Feb 2, 2026
…ce-format#5755)

This is the first in a set of PRs to add both a new scheduler
implementation (a lightweight task-free scheduler) and a new reader
implementation (that utilizes I/O uring).

The need for this isn't terribly obvious.  Instead of...

```
let fut = reader.get_range(...)
```

it would seem we could just do...

```
let fut = async move { reader.get_range(...) };
```

...and we will have a static future. Unfortunately, the latter approach
changes the semantics somewhat. Instead of calling get_range
immediately, it does not call get_range until the future is first
polled. This is a problem for the I/O uring reader which needs to
immediately push the request into the submission queue (and not wait
until polled). Perhaps there is a different solution than this PR but
this PR is also not too onerous and does work.

---

**This PR was partially created with Claude Opus 4.5. I have reviewed
all suggested changes and take full responsibility for them.**
westonpace added a commit that referenced this pull request Mar 31, 2026
~~This is still a draft while waiting on
#5755 and
#5773

This PR adds a new URI scheme `file+uring`. The scheme uses the same
local file reader as `file` but has two custom `Reader` implementations
that are based on the io_uring API. One of these creates a configurable
number of process-wide ring threads and the reader communicates with
this thread using a queue. The second assumes that the scheduler and
decoder run on the same thread and uses a thread local uring instance.

Both are able to saturate up to 1.5M IOPS/s when combined with the
scheduler rework. I've tested the thread local variant up to 2M IOPS/s.
These numbers are assuming the data is not in the kernel page cache.
I've seen results as high as 4M IOPS/s when the data is cached.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants