Skip to content

fix: Handle ENOBUFS in UDP send#3482

Open
larseggert wants to merge 7 commits into
mozilla:mainfrom
larseggert:fix-enobufs
Open

fix: Handle ENOBUFS in UDP send#3482
larseggert wants to merge 7 commits into
mozilla:mainfrom
larseggert:fix-enobufs

Conversation

@larseggert
Copy link
Copy Markdown
Collaborator

Based on a discussion with @ianswett yesterday, don't treat ENOBUFS as a fatal error when sending UDP packets.

Copilot AI review requested due to automatic review settings March 17, 2026 02:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates neqo-udp’s UDP send error handling to avoid treating ENOBUFS as a fatal send error, aligning send behavior with transient NIC/stack buffer exhaustion on some platforms.

Changes:

  • Treat ENOBUFS from UdpSocketState::try_send() as io::ErrorKind::WouldBlock instead of a hard error.
  • Add is_enobufs() helper (unix-gated) and update unit tests to account for the new predicate.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread neqo-udp/src/lib.rs Outdated
Comment thread neqo-udp/src/lib.rs Outdated
Comment thread neqo-udp/src/lib.rs
Comment thread neqo-udp/src/lib.rs
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 17, 2026

Codecov Report

❌ Patch coverage is 91.17647% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.00%. Comparing base (70a1985) to head (39c6798).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3482      +/-   ##
==========================================
- Coverage   95.15%   95.00%   -0.16%     
==========================================
  Files         111      116       +5     
  Lines       38012    38373     +361     
  Branches    38012    38373     +361     
==========================================
+ Hits        36172    36455     +283     
- Misses       1152     1219      +67     
- Partials      688      699      +11     
Flag Coverage Δ
freebsd 94.17% <85.00%> (-0.15%) ⬇️
linux 95.14% <85.00%> (-0.01%) ⬇️
macos 95.10% <85.00%> (-0.01%) ⬇️
windows 95.15% <84.21%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
neqo-common 98.61% <ø> (ø)
neqo-http3 93.92% <ø> (ø)
neqo-qpack 95.62% <ø> (ø)
neqo-transport 95.80% <ø> (-0.01%) ⬇️
neqo-udp 84.69% <91.17%> (-0.21%) ⬇️
mtu 86.61% <ø> (ø)

@larseggert larseggert force-pushed the fix-enobufs branch 3 times, most recently from 04de82b to 345f3cf Compare March 17, 2026 09:02
Based on a discussion with @ianswett yesterday, don't treat `ENOBUFS` as a fatal error when sending UDP packets.
Copy link
Copy Markdown
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer fixing this issue in quinn-udp, thus de-duplicating the logic across the various usages.

quinn-rs/quinn#2293

@larseggert
Copy link
Copy Markdown
Collaborator Author

No objections!

@larseggert
Copy link
Copy Markdown
Collaborator Author

@mxinden quinn-rs/quinn#2293 got closed - what now?

@mxinden
Copy link
Copy Markdown
Member

mxinden commented Apr 1, 2026

@larseggert let's try to get to the root of the discussion. I prepared quinn-rs/quinn#2594. Can you help out?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 10, 2026

Claude finished @larseggert's task in 2m 3s —— View job


Review of PR #3482

  • Read CLAUDE.md and repo guidelines
  • Read the changed file and PR diff
  • Analyze existing review comments
  • Post review with inline comments

Summary

Reviewed the ENOBUFS handling changes. The approach is correct — callers already handle WouldBlock with wait-and-retry loops, making this a clean fit. Posted 2 inline suggestions:

  1. Comment/code scope mismatch — comment says "macOS/BSD" but the predicate fires on all unix + Windows
  2. Missing Windows test for WSAENOBUFS

Also addressed previous copilot review comments (original error loss is acceptable given qdebug! logging, EAGAIN coverage is present in the combined test, and ENOBUFS integration testing is impractical without socket mocking).

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Maps ENOBUFS (and WSAENOBUFS on Windows) to WouldBlock during UDP send, so the existing retry loops in both client and server correctly wait for writability and retry rather than treating it as fatal. Also cleans up is_emsgsize by switching from map_or to is_some_and, removing the clippy suppression.

The approach is sound — I verified that callers (neqo-bin/src/client/mod.rs:484, neqo-bin/src/server/mod.rs:325) handle WouldBlock with a writable().await + retry loop, which is exactly the right recovery for a transient send queue exhaustion. The try_io wrapper in neqo-bin/src/udp.rs:93 also correctly propagates the WouldBlock through tokio's readiness machinery.

Notes

  • Comment/code scope mismatch: The inline comment says "On macOS/BSD" but the predicate fires on all unix + Windows. See inline suggestion.
  • Windows test gap: is_enobufs on Windows (WSAENOBUFS) has no test coverage. See inline suggestion.
  • Upstream dedup: Agree with @mxinden that this could live in quinn-udp to benefit all consumers. If quinn-rs/quinn#2594 lands, this local workaround can be removed. Worth tracking.

On previous review comments

  • @copilot re: losing the original error (line 90): Agreed in principle, but callers only inspect e.kind() and the original is captured by qdebug!. Not blocking.
  • @copilot re: missing send_inner integration test (line 91): Triggering ENOBUFS in a test environment is impractical without mocking the socket layer. The unit tests on the predicates are sufficient.
  • @copilot re: EAGAIN coverage dropped (line 384): This is addressed — the combined is_emsgsize_and_is_enobufs_are_disjoint test at line 377 asserts both predicates return false for EAGAIN.

Comment thread neqo-udp/src/lib.rs Outdated
Comment thread neqo-udp/src/lib.rs
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 10, 2026

Reviewing PR #3482

  • Read CLAUDE.md and repo guidelines
  • Read changed files and PR diff
  • Analyze changes and post review

View job run

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Lars Eggert <lars@eggert.org>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 10, 2026

Claude finished @larseggert's task in 2m 44s —— View job


Reviewing PR #3482

  • Read CLAUDE.md and repo guidelines
  • Read changed file and PR diff
  • Check existing review comments
  • Post review with inline comments

Summary

Reviewed the ENOBUFS → WouldBlock mapping. The approach integrates correctly with tokio's readiness model via try_io, and callers already handle WouldBlock with wait-and-retry loops. Posted a formal PR review with one inline suggestion for missing Windows WSAENOBUFS test coverage.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach is sound. ENOBUFS → WouldBlock integrates correctly with the callers: both client/mod.rs:484 and server/mod.rs:325 retry in a writable().await loop, and critically, the try_io wrapper in neqo-bin/src/udp.rs:93 ensures tokio clears the socket's readiness flag on WouldBlock, preventing a busy-loop.

The is_emsgsize cleanup (is_some_and replacing map_or + clippy suppression) and the combined disjointness test are nice improvements.

One inline comment for missing Windows test coverage. Previous reviewer comments about upstream dedup in quinn-udp (quinn-rs/quinn#2594) are worth tracking — if that lands, this local workaround can be removed.

Comment thread neqo-udp/src/lib.rs
Comment thread neqo-udp/src/lib.rs Outdated
@larseggert
Copy link
Copy Markdown
Collaborator Author

This is on hold pending the discussion in quinn-rs/quinn#2594

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-motivated fix. Treating ENOBUFS as fatal was clearly wrong for a QUIC implementation — mapping it to a retryable condition is the right call.

What's good:

  • The refactoring of is_emsgsize / is_enobufs into separate per-platform functions (addressing @martinthomson's cfg!() concern) is much cleaner than the previous #[cfg]-blocks-inside-closures approach, and drops the clippy::unnecessary_map_or suppression.
  • Test coverage is thorough: disjointness test on Unix, per-error-code tests on Windows, non-OS-error test.
  • Previous review feedback (comment wording, Windows WSAENOBUFS test gap) has all been addressed.

One note (see inline): the synthetic WouldBlock interacts with tokio's try_io readiness clearing. Unlikely to cause issues in practice, but worth being aware of.

Agree with @mxinden that the long-term home for this is quinn-udp (quinn-rs/quinn#2293). Worth tracking so this local workaround can be removed when upstream support lands.

Comment thread neqo-udp/src/lib.rs
@larseggert
Copy link
Copy Markdown
Collaborator Author

Note to self: Add a test that confirms this is basically packet loss and triggers an RTX.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-motivated fix. ENOBUFS should not be fatal for a QUIC stack — mapping it to a retryable condition is correct.

What looks good:

  • Splitting is_emsgsize/is_enobufs into separate per-platform functions (addressing @martinthomson's cfg!() readability concern) is a clear improvement over the #[cfg]-blocks-inside-closures approach, and drops the clippy::unnecessary_map_or suppression.
  • All prior review feedback has been addressed: the comment wording is now platform-neutral, the Windows WSAENOBUFS test has been added, and test coverage is thorough (disjointness on Unix, per-code on Windows, non-OS-error fallback).
  • Callers in both client/mod.rs and server/mod.rs handle WouldBlock with a writable().await + retry loop, which correctly re-sends the dropped batch.

Acknowledged concern (already discussed): The synthetic WouldBlock interacts with tokio's try_io readiness clearing. On edge-triggered platforms (Linux epoll), if ENOBUFS doesn't correlate with actual socket send buffer exhaustion, writable().await could stall until a QUIC timer fires. As previous reviews noted, this is low-risk in practice — ENOBUFS from UDP sendmsg is rare on Linux, and QUIC idle/loss-detection timers provide a safety net. Worth keeping in mind if this is ever observed to cause stalls.

Upstream tracking: Agree with @mxinden that the long-term home for this is quinn-udp (quinn-rs/quinn#2293). Once upstream support lands, this local workaround can be removed.

@github-actions
Copy link
Copy Markdown
Contributor

Benchmark results

No significant performance differences relative to 9a957f6.

All results
transfer/1-conn/1-100mb-resp (aka. Download): Change within noise threshold.
       time:   [184.08 ms 184.51 ms 184.97 ms]
       thrpt:  [540.63 MiB/s 541.97 MiB/s 543.24 MiB/s]
change:
       time:   [+0.4975% +0.7970% +1.1185] (p = 0.00 < 0.05)
       thrpt:  [-1.1062% -0.7907% -0.4950]
       Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
transfer/1-conn/10_000-parallel-1b-resp (aka. RPS): No change in performance detected.
       time:   [283.57 ms 285.36 ms 287.16 ms]
       thrpt:  [34.824 Kelem/s 35.044 Kelem/s 35.265 Kelem/s]
change:
       time:   [-0.0354% +0.8726% +1.7553] (p = 0.06 > 0.05)
       thrpt:  [-1.7250% -0.8650% +0.0355]
       No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
transfer/1-conn/1-1b-resp (aka. HPS): Change within noise threshold.
       time:   [39.073 ms 39.224 ms 39.399 ms]
       thrpt:  [25.381   B/s 25.494   B/s 25.593   B/s]
change:
       time:   [-1.5761% -0.8648% -0.1572] (p = 0.02 < 0.05)
       thrpt:  [+0.1574% +0.8723% +1.6013]
       Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
transfer/1-conn/1-100mb-req (aka. Upload): Change within noise threshold.
       time:   [189.50 ms 190.13 ms 191.07 ms]
       thrpt:  [523.38 MiB/s 525.94 MiB/s 527.70 MiB/s]
change:
       time:   [+0.3627% +0.8334% +1.3505] (p = 0.00 < 0.05)
       thrpt:  [-1.3325% -0.8265% -0.3614]
       Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low mild
4 (4.00%) high mild
3 (3.00%) high severe
streams/walltime/1-streams/each-1000-bytes: No change in performance detected.
       time:   [590.13 µs 592.28 µs 594.76 µs]
       change: [-0.2820% +0.2688% +0.7713] (p = 0.34 > 0.05)
       No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
1 (1.00%) high mild
8 (8.00%) high severe
streams/walltime/1000-streams/each-1-bytes: No change in performance detected.
       time:   [11.913 ms 11.952 ms 12.013 ms]
       change: [-0.2859% +0.0996% +0.6311] (p = 0.72 > 0.05)
       No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe
streams/walltime/1000-streams/each-1000-bytes: Change within noise threshold.
       time:   [42.434 ms 42.472 ms 42.511 ms]
       change: [+0.5524% +0.6924% +0.8256] (p = 0.00 < 0.05)
       Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
streams-flow-controlled/walltime/1-streams/each-4194304-bytes: Change within noise threshold.
       time:   [33.503 ms 33.562 ms 33.622 ms]
       change: [+0.9917% +1.3217% +1.6221] (p = 0.00 < 0.05)
       Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
streams-flow-controlled/walltime/10-streams/each-1048576-bytes: No change in performance detected.
       time:   [94.693 ms 95.940 ms 97.226 ms]
       change: [-0.3679% +1.5294% +3.3250] (p = 0.11 > 0.05)
       No change in performance detected.
transfer/walltime/pacing-false/varying-seeds: Change within noise threshold.
       time:   [22.642 ms 22.669 ms 22.704 ms]
       change: [+1.2730% +1.4503% +1.6367] (p = 0.00 < 0.05)
       Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
transfer/walltime/pacing-true/varying-seeds: Change within noise threshold.
       time:   [22.835 ms 22.866 ms 22.915 ms]
       change: [+0.9532% +1.1164% +1.3911] (p = 0.00 < 0.05)
       Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) low mild
2 (2.00%) high mild
2 (2.00%) high severe
transfer/walltime/pacing-false/same-seed: Change within noise threshold.
       time:   [22.610 ms 22.630 ms 22.652 ms]
       change: [+2.1062% +2.2234% +2.3427] (p = 0.00 < 0.05)
       Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
transfer/walltime/pacing-true/same-seed: Change within noise threshold.
       time:   [22.872 ms 22.901 ms 22.946 ms]
       change: [+0.1805% +0.3212% +0.5158] (p = 0.00 < 0.05)
       Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe

Download data for profiler.firefox.com or download performance comparison data.

@github-actions
Copy link
Copy Markdown
Contributor

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to main at 9a957f6.

neqo-pr as clientneqo-pr as server
neqo-pr vs. go-x-net: BP BA
neqo-pr vs. haproxy: ⚠️L1 BP BA
neqo-pr vs. kwik: BP BA
neqo-pr vs. linuxquic: ⚠️BP
neqo-pr vs. lsquic: L1 C1
neqo-pr vs. msquic: A L1 C1
neqo-pr vs. mvfst: A BP BA
neqo-pr vs. neqo: A
neqo-pr vs. nginx: BP BA
neqo-pr vs. ngtcp2: ⚠️C1 CM
neqo-pr vs. picoquic: A 🚀BP BA ⚠️L1
neqo-pr vs. quic-go: A
neqo-pr vs. quic-zig: B
neqo-pr vs. quiche: BP BA
neqo-pr vs. s2n-quic: 🚀BA CM
neqo-pr vs. tquic: S BP BA
neqo-pr vs. xquic: A ⚠️L1 C1
aioquic vs. neqo-pr: L1 CM
go-x-net vs. neqo-pr: ⚠️BP CM
kwik vs. neqo-pr: BP BA CM
lsquic vs. neqo-pr: run cancelled after 20 min
msquic vs. neqo-pr: CM
mvfst vs. neqo-pr: Z A L1 C1 CM
neqo vs. neqo-pr: A
openssl vs. neqo-pr: LR M A CM
quic-go vs. neqo-pr: CM
quiche vs. neqo-pr: ⚠️L1 C1 CM
quinn vs. neqo-pr: 🚀L1 ⚠️C1 V2 CM
s2n-quic vs. neqo-pr: 🚀B BA CM
tquic vs. neqo-pr: CM
xquic vs. neqo-pr: M CM
All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-pr as client

neqo-pr as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-pr as client

neqo-pr as server

@github-actions
Copy link
Copy Markdown
Contributor

Client/server transfer results

Performance differences relative to f21a27d.

Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.

Client vs. server (params) Mean ± σ Min Max MiB/s ± σ Δ baseline Δ baseline
google-neqo-cubic 264.8 ± 2.7 260.6 277.7 120.9 ± 11.9 💔 1.3 0.5%
neqo-quiche-cubic 190.6 ± 2.7 185.7 204.5 167.9 ± 11.9 💔 1.4 0.7%
s2n-neqo-cubic 219.0 ± 4.1 210.6 229.9 146.1 ± 7.8 💔 1.2 0.6%

Table above only shows statistically significant changes. See all results below.

All results

Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.

Client vs. server (params) Mean ± σ Min Max MiB/s ± σ Δ baseline Δ baseline
google-google-nopacing 461.6 ± 2.1 457.4 471.4 69.3 ± 15.2
google-neqo-cubic 264.8 ± 2.7 260.6 277.7 120.9 ± 11.9 💔 1.3 0.5%
msquic-msquic-nopacing 126.6 ± 21.7 110.3 226.1 252.8 ± 1.5
msquic-neqo-cubic 146.4 ± 19.3 121.7 240.6 218.5 ± 1.7 -0.6 -0.4%
neqo-google-cubic 768.5 ± 2.2 763.0 774.4 41.6 ± 14.5 -0.4 -0.1%
neqo-msquic-cubic 145.5 ± 2.0 140.8 156.7 219.9 ± 16.0 0.1 0.0%
neqo-neqo-cubic 86.3 ± 2.4 82.3 91.9 370.9 ± 13.3 -0.5 -0.6%
neqo-neqo-cubic-nopacing 86.4 ± 2.8 80.1 92.6 370.6 ± 11.4 -0.1 -0.1%
neqo-neqo-newreno 86.5 ± 2.3 82.6 91.9 370.0 ± 13.9 -0.4 -0.5%
neqo-neqo-newreno-nopacing 86.4 ± 2.6 81.6 91.7 370.5 ± 12.3 0.0 0.0%
neqo-quiche-cubic 190.6 ± 2.7 185.7 204.5 167.9 ± 11.9 💔 1.4 0.7%
neqo-s2n-cubic 214.9 ± 1.7 210.1 227.0 148.9 ± 18.8 -0.1 -0.0%
quiche-neqo-cubic 176.5 ± 3.7 166.6 185.8 181.3 ± 8.6 -0.7 -0.4%
quiche-quiche-nopacing 139.2 ± 3.1 133.3 148.3 229.8 ± 10.3
s2n-neqo-cubic 219.0 ± 4.1 210.6 229.9 146.1 ± 7.8 💔 1.2 0.6%
s2n-s2n-nopacing 293.4 ± 26.9 278.6 391.1 109.1 ± 1.2

Download data for profiler.firefox.com or download performance comparison data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants