Fix socket race condition in port tracking#40187
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR removes the background thread used to resolve port-0 (ephemeral) binds and instead resolves them inline in the main GnsPortTracker::Run() loop to avoid duplicating sockets and triggering race conditions.
Changes:
- Removed deferred port-0 resolution thread/queues and related synchronization primitives.
- Made
ResolvePortZeroBindreturnstd::optional<PortAllocation>and resolve/register port-0 binds synchronously. - Simplified port-0 bind processing by directly calling
HandleRequest/TrackPortafter resolution.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/linux/init/GnsPortTracker.h | Removes deferred-resolution thread machinery; updates ResolvePortZeroBind signature to return an optional allocation. |
| src/linux/init/GnsPortTracker.cpp | Deletes deferred resolver thread/queues; resolves port-0 binds inline and registers allocations immediately. |
|
Ideally there would be a test that fails before this change and passes afterwards. Is that possible to add? |
…e-condition-in-port-tracking
|
@chemwolf6922 - this change looks ok to me, but the PortZeroRebindSucceeds tests are failing. Can you investigate? |
…n-port-tracking' of https://github.com/microsoft/WSL into user/chemwolf6922/fix-potential-socket-race-condition-in-port-tracking
OneBlue
left a comment
There was a problem hiding this comment.
Change LGTM. Unfortunately I think a 250ms timeout will lead to inconsistencies if the machine is under pressure, but increasing the timeout too much will negatively affect performance.
Like you said I think the right long term fix is #40178, in the meantime this will be good to fix the immediate double-bind issue
I agree, I think this change is worth taking now and we can spend some time switching over to EBPF hopefully soon. |
Summary of the Pull Request
This PR makes the bind 0 port resolution in port tracking inline. To avoid duplicating the socket and causing race conditions. With the trade off of even slower bind calls.
PR Checklist
Detailed Description of the Pull Request / Additional comments
#40178 would be a better but much more complex solution. Which may not be worth it if the seccomp part is still required.
Validation Steps Performed
Existing tests
NetworkTests::MirroredTests::PortZeroBindIsTracked [Passed]
NetworkTests::VirtioProxyTests::PortZeroBindIsTracked [Passed]
New tests
NetworkTests::MirroredTests::PortZeroRebindSucceeds [Passed]
NetworkTests::VirtioProxyTests::PortZeroRebindSucceeds [Passed]
The problem described in #40109 is reproduceable. I have validated that this PR will fix the issue.
I'm able to trace the vs code server issue's bind calls. And the pattern matches the assumption when it fails without the fix: