Skip to content

fix: split io_uring_submit_and_wait to prevent SQ ring race (issue #1443)#4

Open
ShukantPal wants to merge 2 commits intomasterfrom
fix/issue-1443
Open

fix: split io_uring_submit_and_wait to prevent SQ ring race (issue #1443)#4
ShukantPal wants to merge 2 commits intomasterfrom
fix/issue-1443

Conversation

@ShukantPal
Copy link
Copy Markdown
Owner

io_uring_submit_and_wait() was called without holding ring_lock in
fuse_uring_thread(), while reply threads concurrently call
io_uring_get_sqe()/io_uring_submit() under ring_lock in
fuse_uring_commit_sqe(). Both paths modify the SQ ring, causing
corruption in multithreaded operation.

Split into io_uring_submit() under ring_lock (serialized with other
SQ operations) and io_uring_wait_cqe() without the lock (CQ ring is
only written by the kernel and safe to read concurrently). Add a
post-CQE-processing submit to flush SQEs prepared during handling
(resubmits and synchronous replies skip submission when
cqe_processing is set).

…bfuse#1443)

io_uring_submit_and_wait() was called without holding ring_lock in
fuse_uring_thread(), while reply threads concurrently call
io_uring_get_sqe()/io_uring_submit() under ring_lock in
fuse_uring_commit_sqe(). Both paths modify the SQ ring, causing
corruption in multithreaded operation.

Split into io_uring_submit() under ring_lock (serialized with other
SQ operations) and io_uring_wait_cqe() without the lock (CQ ring is
only written by the kernel and safe to read concurrently). Add a
post-CQE-processing submit to flush SQEs prepared during handling
(resubmits and synchronous replies skip submission when
cqe_processing is set).
@bolt-new-by-stackblitz
Copy link
Copy Markdown

Review PR in StackBlitz Codeflow Run & review this pull request in StackBlitz Codeflow.

Add source-level validation tests that verify the fix for the io_uring
SQ ring race condition in fuse_uring_thread().

The tests check that:
- io_uring_submit_and_wait() is not used (broken pattern)
- io_uring_submit() is called under ring_lock (serialized with
  fuse_uring_commit_sqe from reply threads)
- io_uring_wait_cqe() is called without holding ring_lock
  (safe since CQ ring is kernel-written)
- io_uring_submit() is called after CQE processing to flush
  deferred SQEs from reply threads

These tests reliably fail without the fix and pass with it, avoiding
the non-determinism inherent in runtime race condition testing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant