Skip to content

refactor(openclaw): delegate gateway to OS service manager with Windows compatibility#13165

Draft
DeJeune wants to merge 31 commits intomainfrom
DeJeune/fix-openclaw-close
Draft

refactor(openclaw): delegate gateway to OS service manager with Windows compatibility#13165
DeJeune wants to merge 31 commits intomainfrom
DeJeune/fix-openclaw-close

Conversation

@DeJeune
Copy link
Collaborator

@DeJeune DeJeune commented Mar 3, 2026

What this PR does

Refactors OpenClawService to delegate gateway lifecycle to the OS service manager (launchd/systemd) via OpenClaw CLI commands, with platform-specific handling for Windows.

Core changes:

  • Gateway lifecycle delegated to OS service manager via openclaw gateway install/start/stop/restart/status/health CLI commands
  • Removes in-process ChildProcess management, node:net Socket probing, and killProcess helper
  • Health checks use openclaw gateway health CLI command
  • getStatus is now async and probes health to detect externally-started gateways
  • Return type unified to discriminated union OperationResult = { success: true } | { success: false; message: string }

Windows-specific handling:

  • Skip gateway install/uninstall (scheduled task integration has upstream bugs)
  • Gateway started via openclaw gateway start --force (same as macOS/Linux, but without service registration)
  • Port conflict detection before startup with clear error messages
  • NPM_CONFIG_SCRIPT_SHELL=cmd.exe for npm install compatibility

Install improvements:

  • Auto-install Node.js 22+ if missing (brew/pacman/apt/dnf/winget/choco/scoop)
  • Auto-install Git if missing (best-effort, non-blocking)
  • Force HTTPS for git during npm install (GIT_CONFIG env vars) to avoid SSH key failures
  • China npm mirror acceleration for users in China
  • EACCES retry with sudo-prompt on macOS/Linux
  • Streaming output to UI throughout all steps
  • GBK decoding for Chinese Windows cmd.exe output

UI improvements:

  • Copy button on install/uninstall log container
  • Copy button on error alert messages
  • Selectable error text
  • Fix Start button stuck in loading state on failure

Why we need it and why it was done in this way

  • Delegating to OS service management (launchd/systemd) is more robust — the gateway survives app restarts and can be managed externally
  • Using CLI commands for health checks aligns with OpenClaw's own tooling and avoids false positives
  • Windows skips service registration because the scheduled task integration has too many upstream bugs, but still uses CLI commands for start/stop/status/health
  • Direct npm install (instead of official install scripts) gives full control and avoids platform-specific script issues (cmd.exe quote stripping, TLS, PowerShell execution)

Breaking changes

None. Internal refactoring only — no user-facing API changes.

Special notes for your reviewer

  • Windows: no service registration (gateway install/uninstall skipped via !isWin guards), gateway started with gateway start --force, port checked before startup
  • macOS/Linux: full service lifecycle (gateway install after npm install, ensureGatewayServiceInstalled on startup, uninstallGatewayService before npm uninstall)
  • Auth token synced to both openclaw.cherry.json and default openclaw.json so system services can read it
  • OperationResult is a proper discriminated union in @shared/config/types

Checklist

Release note

NONE

…e dead fields

- Extract repeated `{ success: boolean; message: string }` into
  `OperationResult = { success: true } | { success: false; message: string }`
- Success path no longer carries unused message strings
- Unify restartGateway return type (was `message?: string`, now consistent)
- Remove never-populated `uptime` and `version` from HealthInfo
- Update preload and renderer store types to match

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@DeJeune DeJeune requested a review from 0xfullex as a code owner March 3, 2026 06:03
@DeJeune DeJeune marked this pull request as draft March 3, 2026 06:06
@DeJeune DeJeune marked this pull request as draft March 3, 2026 06:06
Copy link
Collaborator Author

@DeJeune DeJeune left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

The OperationResult discriminated union and dead field cleanup are clean improvements. The type change is well-applied across main, preload, and renderer store. Good work.

Issues Found

Bug (1):

  • checkHealth() has an early-return guard (gatewayStatus !== 'running') that makes getStatus() external gateway detection and startAndWaitForGateway() health polling dead code. Needs a private probe method without the guard.

Significant (1):

  • restartGateway ignores the command result and always returns success.

Minor (2):

  • OperationResult is redeclared in preload as OpenClawOperationResult — consider a shared type.
  • gatewayUrl visibility/path changes are unrelated to the refactor scope.

Positives

  • Discriminated union is the right pattern here — TypeScript narrows cleanly
  • All renderer callers already used the correct !result.success check pattern, so no caller changes needed
  • Dead uptime/version fields properly removed across all three layers

Comment on lines +687 to +690
}
const shellEnv = await getShellEnv()
await this.execOpenClawCommandWithResult(openclawPath, ['gateway', 'restart'], shellEnv)
return { success: true }
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Significant: restartGateway fires gateway restart but ignores the result — if the command fails (non-zero exit or timeout), it still returns { success: true }. Should check the result:

const { code, stderr } = await this.execOpenClawCommandWithResult(openclawPath, ['gateway', 'restart'], shellEnv)
if (code \!== 0) {
  this.gatewayStatus = 'error'
  return { success: false, message: stderr.trim() || `Restart failed with code ${code}` }
}

DeJeune and others added 2 commits March 3, 2026 14:09
…, shared type

- Extract probeGatewayHealth() without status guard so getStatus() can
  detect externally-started gateways and startAndWaitForGateway() can
  poll during 'starting' state
- Check restartGateway command exit code instead of always returning success
- Move OperationResult to @shared/config/types to avoid duplicate definitions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@DeJeune DeJeune marked this pull request as ready for review March 3, 2026 06:23
DeJeune and others added 14 commits March 3, 2026 14:30
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…install

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use the official OpenClaw install scripts (install.sh / install.ps1) instead
of manual npm install. The scripts handle Node.js detection, build tools,
PATH setup, and diagnostics automatically.

- macOS/Linux: curl piped to bash, with sudo-prompt fallback on permission errors
- Windows: PowerShell inline execution
- Remove China mirror logic (official package includes Chinese support)
- Remove SHARP_IGNORE_GLOBAL_LIBVIPS workaround
- Refresh shell env in installGatewayService for PATH changes
- Drop @qingchencloud/openclaw-zh from uninstall args

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…F-8 encoding

- Add copy button to install/uninstall log container header
- Make error alerts selectable with select-text class
- Fix Windows GBK mojibake by wrapping PowerShell via cmd.exe with chcp 65001

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use buffer mode for execFileSync instead of encoding: 'utf8' to prevent
garbled Chinese Windows stderr. Log only exit code on failure instead of
the raw error object containing GBK-encoded text.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Include accumulated stderr in logger.error calls when install or uninstall
processes exit with non-zero code, enabling better debugging of failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove setShowLogs(false) from checkInstallation to prevent logs from
  disappearing immediately after install completes
- Remove dead lastHealthCheck.version display (field removed from HealthInfo)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e quote stripping

Replace cmd.exe /c chcp wrapper with PowerShell-native [Console]::OutputEncoding
to set UTF-8 output. cmd.exe was consuming the quotes needed by PowerShell,
causing the install script to be echoed instead of executed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ownload

Windows PowerShell 5.1 may not enable TLS 1.2 by default, causing
Invoke-WebRequest to fail on HTTPS URLs. Prepend SecurityProtocol setting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…install

Replace the official install.sh/install.ps1 approach with direct
npm install -g openclaw, borrowing key patterns from the scripts:

- Auto-install Node.js if missing or version < 22:
  - macOS: brew install node@22 + brew link
  - Linux: pacman (Arch) or NodeSource setup_22.x + apt-get/dnf
  - Windows: winget → choco → scoop fallback chain
- Auto-install Git if missing (best-effort, non-blocking)
- npm env vars from official scripts: SHARP_IGNORE_GLOBAL_LIBVIPS=1,
  NPM_CONFIG_SCRIPT_SHELL=cmd.exe (Windows), noise suppression
- EACCES retry with sudo-prompt on macOS/Linux
- Streaming output to UI throughout all steps

This gives us full control over the install flow and avoids the
platform-specific issues with the official scripts (cmd.exe quote
stripping, TLS, PowerShell execution, mise/fnm detection).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend install() now auto-installs Node.js and Git, so the frontend
no longer needs to block installation with warning UI. Install button
now proceeds directly to backend which handles everything.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- crossPlatformSpawn: auto-quote command path containing spaces when
  shell: true on Windows, preventing cmd.exe from splitting the path
- Add decodeBufferFromShell(): detect UTF-8 replacement chars and fall
  back to GBK decoding via iconv-lite for Chinese Windows cmd.exe output
- Apply GBK decoding to install/uninstall streaming output
- Remove redundant manual quoting in uninstall() (now handled centrally)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add back npmmirror registry for China users (detected via isUserInChina).
Use openclaw@latest for all users now that the official package supports
Chinese — the separate @qingchencloud/openclaw-zh package is no longer needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@DeJeune DeJeune marked this pull request as draft March 4, 2026 05:46
DeJeune and others added 4 commits March 4, 2026 13:59
handleStartGateway had early returns for syncConfig and startGateway
failures that didn't call setIsStarting(false), leaving the button
permanently stuck in loading state.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The polling loop in startAndWaitForGateway was discarding stderr from
checkGatewayStatus and probeGatewayHealth, so timeout errors only showed
"Gateway failed to start within 30000ms" instead of the real reason
(e.g., "gateway closed (1006 abnormal closure)"). Now captures the last
error from polling attempts and appends it to the timeout message.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ervice

The system service/daemon registered by `openclaw gateway install` does
not inherit the OPENCLAW_CONFIG_PATH env var, so it reads the default
~/.openclaw/openclaw.json. Without the auth token there, the gateway
starts without auth and health checks fail with 1006 abnormal closure.

Now syncProviderConfig() also writes gateway settings (mode, port, auth)
to the default openclaw.json, ensuring the system service starts with
the correct auth token.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DeJeune and others added 2 commits March 4, 2026 21:45
…aming

The transform layer was sending the full accumulated `inputBuffer` as the
`delta` field in `tool-input-delta` chunks. The renderer then concatenated
this already-accumulated string onto `streamingArgs`, causing quadratic
memory growth (~N²/2). For a 100KB file write this ballooned to ~5GB,
crashing the V8 heap.

Fix: send only the incremental `partial_json` fragment (matching how
`thinking_delta` already works) and remove the now-unused `inputBuffer`
field and `appendToolInputDelta` method from `ClaudeStreamState`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DeJeune and others added 5 commits March 5, 2026 11:41
Windows scheduled task integration has too many upstream bugs.
Skip installGatewayService, uninstallGatewayService, and
ensureGatewayServiceInstalled on Windows while keeping gateway
start/stop/health/status CLI commands working normally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Users with git configured to use SSH (git@github.com:...) may fail
during npm install if SSH keys aren't set up. Use GIT_CONFIG env vars
to rewrite GitHub SSH URLs to HTTPS within the npm install subprocess.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
On Windows without a registered service, `openclaw gateway start` runs
in foreground mode and never exits, causing the 20s timeout to kill it.
Instead, spawn `openclaw gateway --port <port>` as a detached background
process with windowsHide, then poll for readiness as usual.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Standardizes gateway startup across platforms by using `gateway start
--force`
instead of platform-specific detached spawning on Windows. Adds
proactive
port availability checking before startup to provide clear error
messages
when the configured port is already in use by another application.
@DeJeune DeJeune changed the title refactor(openclaw): use discriminated union OperationResult and remove dead fields refactor(openclaw): delegate gateway to OS service manager with Windows compatibility Mar 8, 2026
DeJeune and others added 3 commits March 8, 2026 12:01
…D heuristic

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The gateway service install already embeds OPENCLAW_CONFIG_PATH into
the launchd/systemd service definition, so the daemon reads the
correct cherry config file directly. No need to sync back to the
default openclaw.json.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant