Skip to content

fix(update): detect profile dashboard processes in stale-dashboard sweep#56723

Open
liuhao1024 wants to merge 1 commit into
NousResearch:mainfrom
liuhao1024:liuhao/cron-bugfix-56717-stale-profile-runtime
Open

fix(update): detect profile dashboard processes in stale-dashboard sweep#56723
liuhao1024 wants to merge 1 commit into
NousResearch:mainfrom
liuhao1024:liuhao/cron-bugfix-56717-stale-profile-runtime

Conversation

@liuhao1024

@liuhao1024 liuhao1024 commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes the stale-dashboard sweep after hermes update so that dashboard processes launched with --profile <name> (non-default profiles) are correctly detected and killed.

Root cause: _find_stale_dashboard_pids() uses substring matching against the process command line (e.g. "hermes dashboard"). When a non-default profile launches a dashboard with --profile <name> between the binary and the subcommand — hermes --profile bruce dashboard --isolated — the contiguous pattern never matches. The process is invisible to the sweep and continues running stale Python code after the update, causing ImportError when new symbols (like is_output_cap_error) are added.

Fix: Strip --profile <name> / -p <name> tokens from the command line (and collapse resulting double-spaces) before pattern matching. Applied to both the POSIX (ps) and Windows (wmic) code paths.

Related Issue

Fixes #56717

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

  • hermes_cli/main.py: Normalise process command lines in _find_stale_dashboard_pids() by stripping --profile/-p flags before pattern matching (both POSIX and Windows code paths). +21/-2 lines.
  • tests/hermes_cli/test_stale_dashboard_profile_detection.py: 3 regression tests covering --profile, -p, and default (no profile) cases.

How to Test

  1. Run python -m pytest tests/hermes_cli/test_stale_dashboard_profile_detection.py -v — all 3 tests should pass.
  2. Run python -m pytest tests/hermes_cli/test_update_stale_dashboard.py tests/hermes_cli/test_dashboard_lifecycle_flags.py -v — all 22+10 existing tests should pass (no regressions).
  3. Manual verification: start a profile dashboard (hermes --profile test dashboard), then run hermes update — the profile dashboard should be detected and killed (previously it was invisible).

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS 26.4.1

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

The post-update stale-dashboard sweep in `_find_stale_dashboard_pids()`
used substring matching (e.g. `"hermes dashboard"`) against the full
process command line.  When a non-default profile launches a dashboard
with `--profile <name>` between the binary and the subcommand, the
contiguous pattern never matched - the process was invisible to the
sweep and continued running stale code after the update.

Normalise the command line by stripping `--profile <name>` / `-p <name>`
tokens (and collapsing resulting double-spaces) before pattern matching.
Applied to both the POSIX (`ps`) and Windows (`wmic`) code paths.

Fixes NousResearch#56717
@alt-glitch alt-glitch added type/bug Something isn't working comp/cli CLI entry point, hermes_cli/, setup wizard platform/windows Native Windows-specific behavior or breakage P2 Medium — degraded but workaround exists labels Jul 2, 2026

xiawiie commented Jul 2, 2026

Copy link
Copy Markdown

Hi @liuhao1024, thanks for putting together #56723. I checked it against #56717 and the root cause you identified looks right: a profile selector between hermes and dashboard makes the stale-dashboard sweep miss non-default profile dashboard processes.

While testing that path, I found two nearby edge cases in the same stale-runtime class:

  • Hermes also accepts equals-form profile selectors (--profile=bruce, -p=bruce), which the current regex normalization does not strip.
  • The systemd gateway discovery still used the narrower hermes-gateway* glob in update-related service-PID discovery, which can miss profile-shaped units such as hermes-secondbrain-gateway.service.

I opened #56745 as a draft follow-up built on top of your branch/commit rather than reimplementing the fix from scratch, so your original work is preserved. The extra changes are intentionally small: token-aware profile-selector stripping for dashboard/serve process scans on POSIX and Windows WMIC, a shared GATEWAY_SYSTEMD_UNIT_GLOB = "hermes*gateway*", and regression tests for the added cases.

Validated locally with:

scripts/run_tests.sh tests/hermes_cli/test_stale_dashboard_profile_detection.py tests/hermes_cli/test_gateway.py tests/hermes_cli/test_update_stale_dashboard.py tests/hermes_cli/test_dashboard_lifecycle_flags.py tests/test_windows_subprocess_no_window_flags.py -q
# 106 tests passed, 0 failed

uv run --frozen --extra dev ruff check hermes_cli/main.py hermes_cli/gateway.py tests/hermes_cli/test_stale_dashboard_profile_detection.py tests/hermes_cli/test_gateway.py tests/test_windows_subprocess_no_window_flags.py
# All checks passed

Happy to adjust or close #56745 if maintainers would rather fold these follow-ups directly into this PR.

@liuhao1024

Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review and the follow-up, @xiawiie! Glad the root cause analysis held up.

The two additional edge cases you found — equals-form profile selectors and the narrower systemd glob — are good catches. I see #56745 extends the fix with token-aware profile-selector stripping and a shared GATEWAY_SYSTEMD_UNIT_GLOB. Building on top of the existing commit rather than reimplementing is the right approach.

Happy to rebase or adjust this PR if maintainers prefer folding the follow-ups directly here, or if #56745 lands first I can close this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard P2 Medium — degraded but workaround exists platform/windows Native Windows-specific behavior or breakage type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: non-default profile can keep stale runtime after update, causing ImportError

3 participants