Skip to content

Fix wait_port_available error handling on missing process info#18861

Open
messi-lhr wants to merge 3 commits intosgl-project:mainfrom
messi-lhr:fix/wait-port-available
Open

Fix wait_port_available error handling on missing process info#18861
messi-lhr wants to merge 3 commits intosgl-project:mainfrom
messi-lhr:fix/wait-port-available

Conversation

@messi-lhr
Copy link

@messi-lhr messi-lhr commented Feb 15, 2026

Motivation

Fix a robustness issue in wait_port_available where error reporting could itself raise exceptions:

  • find_process_using_port may return None, but code still accessed process.pid
  • timeout path could reference an uninitialized error_message

This makes startup port-conflict diagnostics more reliable.

Modifications

  • Updated python/sglang/srt/utils/common.py:
    • initialize error_message safely
    • handle the process is None branch without dereferencing process
    • keep timeout exception message clean when no process metadata is available
  • Added test/srt/test_wait_port_available.py:
    • timeout with missing process details
    • timeout path before process probing branch
    • raise_exception=False returns False

Accuracy Tests

Not applicable (no model/kernel logic changes).

Functional Tests

Added unit tests in:

  • test/srt/test_wait_port_available.py

Avoid crashing when port ownership lookup returns no process details, and add unit tests to cover timeout behavior with and without process metadata.
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @messi-lhr, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses critical robustness and diagnostic issues within the wait_port_available utility function. By refining error handling for cases where process information is unavailable and ensuring proper initialization of error messages, the changes significantly improve the reliability of startup port-conflict diagnostics.

Highlights

  • Error Handling Robustness: Improved the wait_port_available function to prevent exceptions when find_process_using_port returns None and to ensure the error_message variable is always initialized.
  • Diagnostic Clarity: Enhanced the error messages for port conflicts, providing clearer information when process details are unavailable or when a timeout occurs before process probing.
  • Test Coverage: Added new unit tests to cover various scenarios for wait_port_available, including timeouts with missing process details, clean error messages, and the raise_exception=False path.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • python/sglang/srt/utils/common.py
    • Initialized error_message at the beginning of the function.
    • Modified the logic to safely handle cases where find_process_using_port returns None, preventing AttributeError when accessing process.pid.
    • Adjusted the final ValueError message to conditionally include error_message only if it has content, ensuring cleaner output for simple timeouts.
  • test/srt/test_wait_port_available.py
    • Added a new test file.
    • Included test_timeout_with_missing_process_details to verify correct error handling when process details are not found.
    • Added test_timeout_without_process_probe_has_clean_error_message to ensure the error message is clean when no process probing occurs.
    • Implemented test_raise_exception_false to confirm the function returns False as expected when raise_exception is set to False.
Activity
  • The pull request was created by messi-lhr to fix robustness issues in wait_port_available error handling.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses two critical error handling issues in the wait_port_available function, preventing potential AttributeError and UnboundLocalError exceptions. The added unit tests are comprehensive and validate the fixes for various scenarios. While reviewing, I noticed a pre-existing logic bug in the timeout mechanism where the actual wait duration is only 10% of the specified timeout_s. I've included a suggestion to correct this behavior to match the function's contract.

logger.info(
f"port {port} is in use. Waiting for {i} seconds for {port_name} to be available. {error_message}"
)
time.sleep(0.1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The total wait time in this loop is timeout_s * 0.1 seconds because of the time.sleep(0.1). However, the parameter timeout_s and the error message suggest the timeout is in seconds. This discrepancy can lead to premature timeouts and is misleading. To ensure the function waits for the intended duration, the sleep interval should be 1 second.

Suggested change
time.sleep(0.1)
time.sleep(1)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch. Updated wait_port_available to sleep 1 second per loop so the effective wait duration matches timeout_s.

@messi-lhr
Copy link
Author

messi-lhr commented Feb 15, 2026

Thanks for the review and suggestions.

Could a maintainer please add the run-ci label to trigger CI for this PR? I do not have permission to add labels in the upstream repo.

Use a one-second sleep interval in wait_port_available so the effective wait duration matches timeout_s and its error message.
@messi-lhr messi-lhr force-pushed the fix/wait-port-available branch from 7c32c10 to 3617479 Compare February 16, 2026 00:44
@messi-lhr
Copy link
Author

/tag-and-rerun-ci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant