Fix wait_port_available error handling on missing process info#18861
Fix wait_port_available error handling on missing process info#18861messi-lhr wants to merge 3 commits intosgl-project:mainfrom
Conversation
Avoid crashing when port ownership lookup returns no process details, and add unit tests to cover timeout behavior with and without process metadata.
Summary of ChangesHello @messi-lhr, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses critical robustness and diagnostic issues within the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively addresses two critical error handling issues in the wait_port_available function, preventing potential AttributeError and UnboundLocalError exceptions. The added unit tests are comprehensive and validate the fixes for various scenarios. While reviewing, I noticed a pre-existing logic bug in the timeout mechanism where the actual wait duration is only 10% of the specified timeout_s. I've included a suggestion to correct this behavior to match the function's contract.
python/sglang/srt/utils/common.py
Outdated
| logger.info( | ||
| f"port {port} is in use. Waiting for {i} seconds for {port_name} to be available. {error_message}" | ||
| ) | ||
| time.sleep(0.1) |
There was a problem hiding this comment.
The total wait time in this loop is timeout_s * 0.1 seconds because of the time.sleep(0.1). However, the parameter timeout_s and the error message suggest the timeout is in seconds. This discrepancy can lead to premature timeouts and is misleading. To ensure the function waits for the intended duration, the sleep interval should be 1 second.
| time.sleep(0.1) | |
| time.sleep(1) |
There was a problem hiding this comment.
Thanks for the catch. Updated wait_port_available to sleep 1 second per loop so the effective wait duration matches timeout_s.
|
Thanks for the review and suggestions. Could a maintainer please add the run-ci label to trigger CI for this PR? I do not have permission to add labels in the upstream repo. |
Use a one-second sleep interval in wait_port_available so the effective wait duration matches timeout_s and its error message.
7c32c10 to
3617479
Compare
|
/tag-and-rerun-ci |
Motivation
Fix a robustness issue in
wait_port_availablewhere error reporting could itself raise exceptions:find_process_using_portmay returnNone, but code still accessedprocess.piderror_messageThis makes startup port-conflict diagnostics more reliable.
Modifications
python/sglang/srt/utils/common.py:error_messagesafelyprocess is Nonebranch without dereferencingprocesstest/srt/test_wait_port_available.py:raise_exception=FalsereturnsFalseAccuracy Tests
Not applicable (no model/kernel logic changes).
Functional Tests
Added unit tests in:
test/srt/test_wait_port_available.py