fix(security): harden gateway auth defaults and restrict auto-pair#1217
fix(security): harden gateway auth defaults and restrict auto-pair#1217ericksoa merged 4 commits intoNVIDIA:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughBuild-time input NEMOCLAW_DISABLE_DEVICE_AUTH now controls gateway auth flags instead of hardcoded insecure defaults. The auto-pair watcher validates pending entries, enforces client/mode allowlists, deduplicates request handling, and logs request/client identifiers. Tests verify Dockerfile config generation and auto-pair defensive behavior. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Device as Device / Client
participant AutoPair as Auto-pair Watcher (Python)
participant Gateway as Gateway API
Device->>AutoPair: submit pending approval (device metadata)
AutoPair->>AutoPair: validate type (dict), extract requestId/clientId/clientMode
AutoPair->>AutoPair: check ALLOWED_CLIENTS / ALLOWED_MODES and HANDLED set
alt allowed and not handled
AutoPair->>Gateway: POST approve requestId
AutoPair->>AutoPair: HANDLED.add(requestId)
AutoPair-->>Device: log approval (requestId, clientId)
else disallowed, invalid, or already handled
AutoPair->>Gateway: POST reject requestId (if applicable)
AutoPair-->>Device: log rejection (requestId, clientId/mode)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/security-c2-dockerfile-injection.test.js (1)
340-366: Consider extracting ENV-block parsing into a shared helper.The ENV block parsing logic (lines 344-358) is nearly identical to the pattern used for
CHAT_UI_URL(lines 186-210) andNEMOCLAW_MODEL(lines 246-264). Extracting this into a helper function would reduce duplication and make future additions easier:function isEnvPromotedBeforePythonRun(dockerfile, envVarName) { const lines = dockerfile.split("\n"); let promoted = false; let inEnvBlock = false; for (const line of lines) { if (/^\s*FROM\b/.test(line)) { promoted = false; inEnvBlock = false; } if (/^\s*ENV\b/.test(line)) inEnvBlock = true; if (inEnvBlock && new RegExp(`${envVarName}[=\\s]`).test(line)) promoted = true; if (inEnvBlock && !/\\\s*$/.test(line)) inEnvBlock = false; if (/^\s*RUN\b.*python3\s+-c\b/.test(line)) return promoted; } return promoted; }This is optional — the current approach is clear and self-contained.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/security-c2-dockerfile-injection.test.js` around lines 340 - 366, Extract the repeated ENV-block parsing into a shared helper (e.g. isEnvPromotedBeforePythonRun) and replace the duplicated loops in the three tests (the block using promoted/inEnvBlock and the RUN regex check) with calls to that helper; the helper should accept the Dockerfile string and env var name, scan lines resetting state on /^\\s*FROM\\b/, toggling inEnvBlock on /^\\s*ENV\\b/, setting promoted when inEnvBlock and new RegExp(`${envVarName}[=\\s]`) matches, closing the ENV block when a line does not end with /\\\s*$/, and returning promoted when it encounters the RUN python3 -c pattern (/^\\s*RUN\\b.*python3\\s+-c\\b/) — then update the three tests (the current test plus the CHAT_UI_URL and NEMOCLAW_MODEL checks) to call isEnvPromotedBeforePythonRun(DOCKERFILE_CONTENT, "VAR_NAME") and assert the returned boolean.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/security-c2-dockerfile-injection.test.js`:
- Around line 340-366: Extract the repeated ENV-block parsing into a shared
helper (e.g. isEnvPromotedBeforePythonRun) and replace the duplicated loops in
the three tests (the block using promoted/inEnvBlock and the RUN regex check)
with calls to that helper; the helper should accept the Dockerfile string and
env var name, scan lines resetting state on /^\\s*FROM\\b/, toggling inEnvBlock
on /^\\s*ENV\\b/, setting promoted when inEnvBlock and new
RegExp(`${envVarName}[=\\s]`) matches, closing the ENV block when a line does
not end with /\\\s*$/, and returning promoted when it encounters the RUN python3
-c pattern (/^\\s*RUN\\b.*python3\\s+-c\\b/) — then update the three tests (the
current test plus the CHAT_UI_URL and NEMOCLAW_MODEL checks) to call
isEnvPromotedBeforePythonRun(DOCKERFILE_CONTENT, "VAR_NAME") and assert the
returned boolean.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 85e21d8f-7e88-4559-9b2c-6bc34a42aecd
📒 Files selected for processing (4)
Dockerfilescripts/nemoclaw-start.shtest/nemoclaw-start.test.jstest/security-c2-dockerfile-injection.test.js
Note: Overlap with #1149 and #690Two open PRs touch the same code paths:
Neither is a blocker for this PR, but flagging for merge coordination. |
ericksoa
left a comment
There was a problem hiding this comment.
Regression Risk Analysis
The security fix is correct — flipping to secure-by-default is the right call. But the default flip is a breaking change that needs coordination before merge.
High Risk: Default onboarding flow breaks
dangerouslyDisableDeviceAuth goes from always True to False by default (NEMOCLAW_DISABLE_DEVICE_AUTH=0). Every image built without explicitly passing --build-arg NEMOCLAW_DISABLE_DEVICE_AUTH=1 will now require device pairing before the dashboard is accessible.
The current nemoclaw onboard flow assumes the dashboard is immediately accessible via SSH port-forward — that assumption breaks. Impact:
- All fresh installs that don't set the build arg hit the pairing wall
- Existing deployments that rebuild their image silently change behavior
- Brev E2E and any CI/CD that builds images will need updating
Questions:
- Does
bin/lib/onboard.jspassNEMOCLAW_DISABLE_DEVICE_AUTH=1when building images during onboard? If not, first-run breaks. - Does
scripts/brev-setup.shset this build arg? If not, E2E will fail. - Should this be announced as a breaking change in release notes?
Medium Risk: Auto-pair client allowlist is narrow
ALLOWED_CLIENTS = {'openclaw-control-ui'} and ALLOWED_MODES = {'webchat'} will reject any client that doesn't send these exact values. The PR correctly notes these are "client-supplied and spoofable" (defense-in-depth, not a trust boundary), but legitimate clients that don't set clientId/clientMode will fail silently. Future OpenClaw clients or custom dashboards would need to be added to the allowlist.
Low Risk: allowInsecureAuth scheme derivation
Default CHAT_UI_URL is http://127.0.0.1:18789 so allow_insecure stays True for default deployments. Anyone with a custom https:// CHAT_UI_URL gets allowInsecureAuth=False, which is correct behavior.
What's solid
Tests are comprehensive, HANDLED set prevents reprocessing, isinstance(device, dict) guard is strictly safer, build arg is documented in the script header.
Findings on @ericksoa's Regression QuestionsInvestigated both questions — both concerns are confirmed: 1.
|
…v flows The secure-by-default flip (NEMOCLAW_DISABLE_DEVICE_AUTH=0) broke two automated paths that expect immediate dashboard access without device pairing: - bin/lib/onboard.js: patchStagedDockerfile() now sets the ARG to 1 - scripts/setup.sh: sed-patches the staged Dockerfile before build Addresses review feedback from @ericksoa on #1217.
8409990 to
fcb0fae
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/setup.sh`:
- Around line 236-238: The sed -i call that rewrites ARG
NEMOCLAW_DISABLE_DEVICE_AUTH in the Dockerfile is not portable to macOS; update
the script to use a portable inplace-edit approach (either choose perl -pi -e or
branch on uname to call sed -i '' on Darwin and sed -i on Linux). Specifically
replace the existing sed invocation that targets "$BUILD_CTX/Dockerfile" (the
line that changes ARG NEMOCLAW_DISABLE_DEVICE_AUTH) with a cross-platform block
like: if [[ "$(uname -s)" == "Darwin" ]]; then sed -i '' 's/^ARG
NEMOCLAW_DISABLE_DEVICE_AUTH=.*/ARG NEMOCLAW_DISABLE_DEVICE_AUTH=1/'
"$BUILD_CTX/Dockerfile"; else sed -i 's/^ARG NEMOCLAW_DISABLE_DEVICE_AUTH=.*/ARG
NEMOCLAW_DISABLE_DEVICE_AUTH=1/' "$BUILD_CTX/Dockerfile"; fi so the change works
on both macOS and Linux.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4abfa8f6-b3e5-4c32-b1d6-8388b33cd405
📒 Files selected for processing (6)
Dockerfilebin/lib/onboard.jsscripts/nemoclaw-start.shscripts/setup.shtest/nemoclaw-start.test.jstest/security-c2-dockerfile-injection.test.js
🚧 Files skipped from review as they are similar to previous changes (2)
- test/security-c2-dockerfile-injection.test.js
- test/nemoclaw-start.test.js
scripts/setup.sh
Outdated
| # Brev/setup flow expects immediate dashboard access without device pairing, | ||
| # so disable device auth for images built via setup.sh (see #1217). | ||
| sed -i 's/^ARG NEMOCLAW_DISABLE_DEVICE_AUTH=.*/ARG NEMOCLAW_DISABLE_DEVICE_AUTH=1/' "$BUILD_CTX/Dockerfile" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check for macOS-specific logic in setup.sh
rg -n "Darwin|macOS|brew|Colima" scripts/setup.shRepository: NVIDIA/NemoClaw
Length of output: 669
🏁 Script executed:
# Check for other sed -i usages in the file
rg -n "sed\s+-i" scripts/setup.shRepository: NVIDIA/NemoClaw
Length of output: 170
🏁 Script executed:
# Get the exact content around lines 236-238 and surrounding context
sed -n '230,245p' scripts/setup.shRepository: NVIDIA/NemoClaw
Length of output: 920
🏁 Script executed:
# Check if .shellcheckrc has any sed-related rules
cat .shellcheckrc 2>/dev/null || echo "No .shellcheckrc found"Repository: NVIDIA/NemoClaw
Length of output: 478
Potential portability issue: sed -i without suffix may fail on macOS.
The sed -i command at line 238 behaves differently between GNU sed (Linux) and BSD sed (macOS):
- GNU sed:
sed -i 's/...'works - BSD sed:
sed -i 's/...'fails; requiressed -i '' 's/...'
Since this script explicitly supports macOS (Darwin check at line 166-167, brew installation, and documented Colima/Docker Desktop support), consider using a portable pattern:
🛠️ Portable sed -i replacement
-sed -i 's/^ARG NEMOCLAW_DISABLE_DEVICE_AUTH=.*/ARG NEMOCLAW_DISABLE_DEVICE_AUTH=1/' "$BUILD_CTX/Dockerfile"
+sed 's/^ARG NEMOCLAW_DISABLE_DEVICE_AUTH=.*/ARG NEMOCLAW_DISABLE_DEVICE_AUTH=1/' "$BUILD_CTX/Dockerfile" > "$BUILD_CTX/Dockerfile.tmp" && mv "$BUILD_CTX/Dockerfile.tmp" "$BUILD_CTX/Dockerfile"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Brev/setup flow expects immediate dashboard access without device pairing, | |
| # so disable device auth for images built via setup.sh (see #1217). | |
| sed -i 's/^ARG NEMOCLAW_DISABLE_DEVICE_AUTH=.*/ARG NEMOCLAW_DISABLE_DEVICE_AUTH=1/' "$BUILD_CTX/Dockerfile" | |
| # Brev/setup flow expects immediate dashboard access without device pairing, | |
| # so disable device auth for images built via setup.sh (see `#1217`). | |
| sed 's/^ARG NEMOCLAW_DISABLE_DEVICE_AUTH=.*/ARG NEMOCLAW_DISABLE_DEVICE_AUTH=1/' "$BUILD_CTX/Dockerfile" > "$BUILD_CTX/Dockerfile.tmp" && mv "$BUILD_CTX/Dockerfile.tmp" "$BUILD_CTX/Dockerfile" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/setup.sh` around lines 236 - 238, The sed -i call that rewrites ARG
NEMOCLAW_DISABLE_DEVICE_AUTH in the Dockerfile is not portable to macOS; update
the script to use a portable inplace-edit approach (either choose perl -pi -e or
branch on uname to call sed -i '' on Darwin and sed -i on Linux). Specifically
replace the existing sed invocation that targets "$BUILD_CTX/Dockerfile" (the
line that changes ARG NEMOCLAW_DISABLE_DEVICE_AUTH) with a cross-platform block
like: if [[ "$(uname -s)" == "Darwin" ]]; then sed -i '' 's/^ARG
NEMOCLAW_DISABLE_DEVICE_AUTH=.*/ARG NEMOCLAW_DISABLE_DEVICE_AUTH=1/'
"$BUILD_CTX/Dockerfile"; else sed -i 's/^ARG NEMOCLAW_DISABLE_DEVICE_AUTH=.*/ARG
NEMOCLAW_DISABLE_DEVICE_AUTH=1/' "$BUILD_CTX/Dockerfile"; fi so the change works
on both macOS and Linux.
ericksoa
left a comment
There was a problem hiding this comment.
Changes look good — secure-by-default with the onboard/setup escape hatch is the right approach. Approving.
…VIDIA#117) Fixes NVIDIA#117 — supersedes PR NVIDIA#123 which targeted stale code paths. Dockerfile: secure-by-default controlUi config - dangerouslyDisableDeviceAuth now defaults to false (was hardcoded True) - allowInsecureAuth derived from CHAT_UI_URL scheme (false for https) - New build-arg NEMOCLAW_DISABLE_DEVICE_AUTH=0 as opt-in escape hatch - Env var promoted safely via ENV (no ARG interpolation into Python) nemoclaw-start.sh: auto-pair client whitelisting - Only approve devices with known clientId (openclaw-control-ui) or clientMode (webchat); reject and log unknown clients - Validate device is a dict before accessing fields - Log client identity on approval for audit trail Tests (13 new) - Dockerfile: no hardcoded True for auth settings, env var derivation, ARG default, ENV promotion before RUN layer - Start script: whitelist definitions, rejection logic, dict validation, client identity logging, no unconditional approval pattern
- allowInsecureAuth: use explicit http allowlist (parsed.scheme == 'http') instead of https denylist (parsed.scheme != 'https') so malformed or unknown schemes default to secure (CodeRabbit finding on NVIDIA#123) - Add HANDLED set to track rejected/approved requestIds so rejected devices are not reprocessed every poll cycle (CodeRabbit finding on NVIDIA#123) - Document that clientId/clientMode are client-supplied and spoofable; the allowlist is defense-in-depth, not a trust boundary (@cv review on NVIDIA#123) - Reference PR NVIDIA#690 for the more comprehensive auto-pair fix (one-shot exit, timeout reduction, token cleanup)
- Move ALLOWED_CLIENTS/ALLOWED_MODES above the while loop — these are constants, not per-iteration state. Avoids reconstructing sets every poll cycle. - Document NEMOCLAW_DISABLE_DEVICE_AUTH as build-time only in the script header. Setting it at runtime has no effect because openclaw.json is baked at image build and verified by hash at startup. - Add tests: constants defined before loop, header mentions build-time.
…v flows The secure-by-default flip (NEMOCLAW_DISABLE_DEVICE_AUTH=0) broke two automated paths that expect immediate dashboard access without device pairing: - bin/lib/onboard.js: patchStagedDockerfile() now sets the ARG to 1 - scripts/setup.sh: sed-patches the staged Dockerfile before build Addresses review feedback from @ericksoa on NVIDIA#1217.
966847d to
d25a13a
Compare
…1217) ## Summary Fixes #117 — Insecure authentication configuration in gateway setup. > **Supersedes PR #123**, which targeted the correct security issues but patched code paths that have since moved. Recommend closing #123 in favor of this PR. --- ## Background NemoClaw runs an AI assistant inside a locked-down sandbox. The **gateway** is the front door to that sandbox, and the **Control UI** is a web dashboard users open in their browser to interact with the assistant. There were three security problems with that front door: ### Problem 1: Device auth was permanently disabled 🔓 The config had `dangerouslyDisableDeviceAuth: True` hardcoded. This meant any device could connect to the gateway without proving its identity. It was added as a temporary workaround (commit `5fb629f`) and never reverted. While the sandbox network isolation mitigates this today, it becomes dangerous if combined with LAN-bind changes or cloudflared tunnels in remote deployments — resulting in an unauthenticated, publicly reachable dashboard. **Fix:** Default to `false` (device auth enabled). Developers who genuinely need it disabled for headless/dev environments can set `NEMOCLAW_DISABLE_DEVICE_AUTH=1` as a build arg. ### Problem 2: Insecure auth was always allowed, even over HTTPS 🌐 `allowInsecureAuth: True` was hardcoded, meaning insecure (non-HTTPS) authentication was permitted regardless of deployment context. For local development on `http://127.0.0.1` this is fine, but for production/remote deployments over `https://` it defeats the purpose of TLS. **Fix:** Derive it from the `CHAT_UI_URL` scheme. If the URL is `http://` → allow insecure auth (local dev). If `https://` → block it (production). ### Problem 3: The auto-pair bouncer let everyone in 🚪 The `start_auto_pair` watcher automatically approved *every* pending device pairing request with zero validation. A rogue or unexpected client type could pair with the gateway unchallenged. **Fix:** Only approve devices with recognized `clientId` (`openclaw-control-ui`) or `clientMode` (`webchat`). Unknown clients are rejected and logged. --- ## Why this PR exists (instead of merging #123) PR #123 (opened Mar 17) correctly identified all three issues and proposed the right fixes. However, since then, commit `0f8eedd` ("make openclaw.json immutable at runtime, move config to build time") moved the gateway `controlUi` configuration from `scripts/nemoclaw-start.sh` into the **Dockerfile**. The config is now baked at image build time and verified via SHA-256 hash at startup. PR #123's changes to the start script's Python config block would patch dead code — that block no longer exists. This PR applies the same security logic where the code actually lives now: | Issue | PR #123 patched | This PR patches | |-------|----------------|----------------| | `dangerouslyDisableDeviceAuth` | `scripts/nemoclaw-start.sh` (removed) | `Dockerfile` (build-time config) | | `allowInsecureAuth` | `scripts/nemoclaw-start.sh` (removed) | `Dockerfile` (build-time config) | | Auto-pair whitelisting | `scripts/nemoclaw-start.sh` ✅ | `scripts/nemoclaw-start.sh` ✅ | --- ## Changes ### `Dockerfile` (+8/-2) - `dangerouslyDisableDeviceAuth` → derived from `NEMOCLAW_DISABLE_DEVICE_AUTH` env var (default `0` = secure) - `allowInsecureAuth` → derived from `CHAT_UI_URL` scheme (`false` when `https://`) - New `ARG NEMOCLAW_DISABLE_DEVICE_AUTH=0` with `ENV` promotion (follows existing C-2 safe pattern — no ARG interpolation into Python source) ### `scripts/nemoclaw-start.sh` (+12/-4) - Auto-pair only approves `ALLOWED_CLIENTS = {'openclaw-control-ui'}` and `ALLOWED_MODES = {'webchat'}` - Validates `isinstance(device, dict)` before field access - Rejects unknown clients with `[auto-pair] rejected unknown client=... mode=...` log - Logs `client=` on approval for audit trail - Documents `NEMOCLAW_DISABLE_DEVICE_AUTH` in script header ### `test/security-c2-dockerfile-injection.test.js` (+6 tests) - No hardcoded `True` for `dangerouslyDisableDeviceAuth` - No hardcoded `True` for `allowInsecureAuth` - Env var derivation for both settings - `ARG` defaults to `0` - `ENV` promotion before `RUN` layer ### `test/nemoclaw-start.test.js` (+7 tests) - Whitelist definitions (`ALLOWED_CLIENTS`, `ALLOWED_MODES`) - Rejection logic for unknown clients - Dict type validation - Client identity in approval logs - No unconditional approval pattern - `NEMOCLAW_DISABLE_DEVICE_AUTH` documented --- ## Why no Brev E2E test These changes are **deterministic config values and simple conditional logic**, not runtime behavior with environmental ambiguity: 1. **Dockerfile config** — Two Python booleans evaluated at build time: `os.environ.get(...) == '1'` and `parsed.scheme != 'https'`. If the source is correct, the output is correct. 2. **Auto-pair whitelisting** — Python set membership check. No shell interpretation, no SSH pipeline, no quoting edge cases. This is fundamentally different from something like the telegram bridge injection fix, where you *must* send real payloads through a real shell on a real sandbox to prove metacharacters don't get interpreted. Our changes have no gap between "looks right in source" and "works right at runtime." The 13 source-pattern tests provide sufficient confidence. If deeper verification is desired in the future, a **build-time unit test** that runs the Dockerfile's Python snippet and asserts the JSON output would be the appropriate next step — a regular CI test, not a Brev instance. --- ## Test plan - [x] All 752 tests pass (739 existing + 13 new) - [x] `shellcheck` ✅ - [x] `hadolint` ✅ - [x] `shfmt` / `prettier` ✅ - [x] `gitleaks` ✅ - [ ] Default onboarding: gateway starts with device auth enabled, Control UI pairing works via auto-pair - [ ] `NEMOCLAW_DISABLE_DEVICE_AUTH=1` build: device auth is disabled (backward-compatible escape hatch) - [ ] `CHAT_UI_URL=https://...` build: `allowInsecureAuth` is `false` - [ ] `CHAT_UI_URL=http://127.0.0.1:18789` (default): `allowInsecureAuth` is `true` - [ ] Unknown device type connecting to gateway: auto-pair rejects and logs - [ ] `openclaw-control-ui` / `webchat` devices: auto-pair approves as before <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Security** * Device authentication can be toggled at build time; UI insecure-auth is allowed only when the control UI URL uses http. * **Bug Fixes** * Auto-pair processing now validates inputs, rejects unknown client/mode, logs identifiers, and deduplicates requests to avoid repeated approvals/rejections. * **Tests** * Added tests for auto-pair validation, logging, de-duplication, and configurable auth behavior. * **Chores** * Sandbox/setup flow now enforces the build-time device-auth setting for images. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Cherry-picked k8s hardening from @13ernkastel's PR NVIDIA#1149, which was closed after its auth changes were superseded by NVIDIA#1217. Changes to k8s/nemoclaw-k8s.yaml: - automountServiceAccountToken: false - enableServiceLinks: false - workspace container: allowPrivilegeEscalation false, drop ALL caps, RuntimeDefault seccomp - COMPATIBLE_API_KEY from optional Secret with dummy fallback - NEMOCLAW_POLICY_MODE default changed from skip to suggested - Replace curl|bash with download-then-execute pattern Also adds k8s/README.md updates and regression test. Co-authored-by: 13ernkastel <LennonCMJ@live.com>
…VIDIA#1217) Backport from upstream NVIDIA/NemoClaw: - Dockerfile: dangerouslyDisableDeviceAuth defaults to false (secure); escape hatch via NEMOCLAW_DISABLE_DEVICE_AUTH=1 build arg - Dockerfile: allowInsecureAuth derived from CHAT_UI_URL scheme (true for http://, false for https://) - nemoclaw-start.sh: auto-pair only approves openclaw-control-ui/webchat; unknown clients rejected with HANDLED dedup set to avoid reprocessing - onboard.js: patchStagedDockerfile sets NEMOCLAW_DISABLE_DEVICE_AUTH=1 for onboard images (immediate dashboard access without device pairing) - 13 new source-pattern tests covering all three security fixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…VIDIA#1217) ## Summary Fixes NVIDIA#117 — Insecure authentication configuration in gateway setup. > **Supersedes PR NVIDIA#123**, which targeted the correct security issues but patched code paths that have since moved. Recommend closing NVIDIA#123 in favor of this PR. --- ## Background NemoClaw runs an AI assistant inside a locked-down sandbox. The **gateway** is the front door to that sandbox, and the **Control UI** is a web dashboard users open in their browser to interact with the assistant. There were three security problems with that front door: ### Problem 1: Device auth was permanently disabled 🔓 The config had `dangerouslyDisableDeviceAuth: True` hardcoded. This meant any device could connect to the gateway without proving its identity. It was added as a temporary workaround (commit `5fb629f`) and never reverted. While the sandbox network isolation mitigates this today, it becomes dangerous if combined with LAN-bind changes or cloudflared tunnels in remote deployments — resulting in an unauthenticated, publicly reachable dashboard. **Fix:** Default to `false` (device auth enabled). Developers who genuinely need it disabled for headless/dev environments can set `NEMOCLAW_DISABLE_DEVICE_AUTH=1` as a build arg. ### Problem 2: Insecure auth was always allowed, even over HTTPS 🌐 `allowInsecureAuth: True` was hardcoded, meaning insecure (non-HTTPS) authentication was permitted regardless of deployment context. For local development on `http://127.0.0.1` this is fine, but for production/remote deployments over `https://` it defeats the purpose of TLS. **Fix:** Derive it from the `CHAT_UI_URL` scheme. If the URL is `http://` → allow insecure auth (local dev). If `https://` → block it (production). ### Problem 3: The auto-pair bouncer let everyone in 🚪 The `start_auto_pair` watcher automatically approved *every* pending device pairing request with zero validation. A rogue or unexpected client type could pair with the gateway unchallenged. **Fix:** Only approve devices with recognized `clientId` (`openclaw-control-ui`) or `clientMode` (`webchat`). Unknown clients are rejected and logged. --- ## Why this PR exists (instead of merging NVIDIA#123) PR NVIDIA#123 (opened Mar 17) correctly identified all three issues and proposed the right fixes. However, since then, commit `0f8eedd` ("make openclaw.json immutable at runtime, move config to build time") moved the gateway `controlUi` configuration from `scripts/nemoclaw-start.sh` into the **Dockerfile**. The config is now baked at image build time and verified via SHA-256 hash at startup. PR NVIDIA#123's changes to the start script's Python config block would patch dead code — that block no longer exists. This PR applies the same security logic where the code actually lives now: | Issue | PR NVIDIA#123 patched | This PR patches | |-------|----------------|----------------| | `dangerouslyDisableDeviceAuth` | `scripts/nemoclaw-start.sh` (removed) | `Dockerfile` (build-time config) | | `allowInsecureAuth` | `scripts/nemoclaw-start.sh` (removed) | `Dockerfile` (build-time config) | | Auto-pair whitelisting | `scripts/nemoclaw-start.sh` ✅ | `scripts/nemoclaw-start.sh` ✅ | --- ## Changes ### `Dockerfile` (+8/-2) - `dangerouslyDisableDeviceAuth` → derived from `NEMOCLAW_DISABLE_DEVICE_AUTH` env var (default `0` = secure) - `allowInsecureAuth` → derived from `CHAT_UI_URL` scheme (`false` when `https://`) - New `ARG NEMOCLAW_DISABLE_DEVICE_AUTH=0` with `ENV` promotion (follows existing C-2 safe pattern — no ARG interpolation into Python source) ### `scripts/nemoclaw-start.sh` (+12/-4) - Auto-pair only approves `ALLOWED_CLIENTS = {'openclaw-control-ui'}` and `ALLOWED_MODES = {'webchat'}` - Validates `isinstance(device, dict)` before field access - Rejects unknown clients with `[auto-pair] rejected unknown client=... mode=...` log - Logs `client=` on approval for audit trail - Documents `NEMOCLAW_DISABLE_DEVICE_AUTH` in script header ### `test/security-c2-dockerfile-injection.test.js` (+6 tests) - No hardcoded `True` for `dangerouslyDisableDeviceAuth` - No hardcoded `True` for `allowInsecureAuth` - Env var derivation for both settings - `ARG` defaults to `0` - `ENV` promotion before `RUN` layer ### `test/nemoclaw-start.test.js` (+7 tests) - Whitelist definitions (`ALLOWED_CLIENTS`, `ALLOWED_MODES`) - Rejection logic for unknown clients - Dict type validation - Client identity in approval logs - No unconditional approval pattern - `NEMOCLAW_DISABLE_DEVICE_AUTH` documented --- ## Why no Brev E2E test These changes are **deterministic config values and simple conditional logic**, not runtime behavior with environmental ambiguity: 1. **Dockerfile config** — Two Python booleans evaluated at build time: `os.environ.get(...) == '1'` and `parsed.scheme != 'https'`. If the source is correct, the output is correct. 2. **Auto-pair whitelisting** — Python set membership check. No shell interpretation, no SSH pipeline, no quoting edge cases. This is fundamentally different from something like the telegram bridge injection fix, where you *must* send real payloads through a real shell on a real sandbox to prove metacharacters don't get interpreted. Our changes have no gap between "looks right in source" and "works right at runtime." The 13 source-pattern tests provide sufficient confidence. If deeper verification is desired in the future, a **build-time unit test** that runs the Dockerfile's Python snippet and asserts the JSON output would be the appropriate next step — a regular CI test, not a Brev instance. --- ## Test plan - [x] All 752 tests pass (739 existing + 13 new) - [x] `shellcheck` ✅ - [x] `hadolint` ✅ - [x] `shfmt` / `prettier` ✅ - [x] `gitleaks` ✅ - [ ] Default onboarding: gateway starts with device auth enabled, Control UI pairing works via auto-pair - [ ] `NEMOCLAW_DISABLE_DEVICE_AUTH=1` build: device auth is disabled (backward-compatible escape hatch) - [ ] `CHAT_UI_URL=https://...` build: `allowInsecureAuth` is `false` - [ ] `CHAT_UI_URL=http://127.0.0.1:18789` (default): `allowInsecureAuth` is `true` - [ ] Unknown device type connecting to gateway: auto-pair rejects and logs - [ ] `openclaw-control-ui` / `webchat` devices: auto-pair approves as before <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Security** * Device authentication can be toggled at build time; UI insecure-auth is allowed only when the control UI URL uses http. * **Bug Fixes** * Auto-pair processing now validates inputs, rejects unknown client/mode, logs identifiers, and deduplicates requests to avoid repeated approvals/rejections. * **Tests** * Added tests for auto-pair validation, logging, de-duplication, and configurable auth behavior. * **Chores** * Sandbox/setup flow now enforces the build-time device-auth setting for images. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Fixes #117 — Insecure authentication configuration in gateway setup.
Background
NemoClaw runs an AI assistant inside a locked-down sandbox. The gateway is the front door to that sandbox, and the Control UI is a web dashboard users open in their browser to interact with the assistant.
There were three security problems with that front door:
Problem 1: Device auth was permanently disabled 🔓
The config had
dangerouslyDisableDeviceAuth: Truehardcoded. This meant any device could connect to the gateway without proving its identity. It was added as a temporary workaround (commit5fb629f) and never reverted. While the sandbox network isolation mitigates this today, it becomes dangerous if combined with LAN-bind changes or cloudflared tunnels in remote deployments — resulting in an unauthenticated, publicly reachable dashboard.Fix: Default to
false(device auth enabled). Developers who genuinely need it disabled for headless/dev environments can setNEMOCLAW_DISABLE_DEVICE_AUTH=1as a build arg.Problem 2: Insecure auth was always allowed, even over HTTPS 🌐
allowInsecureAuth: Truewas hardcoded, meaning insecure (non-HTTPS) authentication was permitted regardless of deployment context. For local development onhttp://127.0.0.1this is fine, but for production/remote deployments overhttps://it defeats the purpose of TLS.Fix: Derive it from the
CHAT_UI_URLscheme. If the URL ishttp://→ allow insecure auth (local dev). Ifhttps://→ block it (production).Problem 3: The auto-pair bouncer let everyone in 🚪
The
start_auto_pairwatcher automatically approved every pending device pairing request with zero validation. A rogue or unexpected client type could pair with the gateway unchallenged.Fix: Only approve devices with recognized
clientId(openclaw-control-ui) orclientMode(webchat). Unknown clients are rejected and logged.Why this PR exists (instead of merging #123)
PR #123 (opened Mar 17) correctly identified all three issues and proposed the right fixes. However, since then, commit
0f8eedd("make openclaw.json immutable at runtime, move config to build time") moved the gatewaycontrolUiconfiguration fromscripts/nemoclaw-start.shinto the Dockerfile. The config is now baked at image build time and verified via SHA-256 hash at startup.PR #123's changes to the start script's Python config block would patch dead code — that block no longer exists. This PR applies the same security logic where the code actually lives now:
dangerouslyDisableDeviceAuthscripts/nemoclaw-start.sh(removed)Dockerfile(build-time config)allowInsecureAuthscripts/nemoclaw-start.sh(removed)Dockerfile(build-time config)scripts/nemoclaw-start.sh✅scripts/nemoclaw-start.sh✅Changes
Dockerfile(+8/-2)dangerouslyDisableDeviceAuth→ derived fromNEMOCLAW_DISABLE_DEVICE_AUTHenv var (default0= secure)allowInsecureAuth→ derived fromCHAT_UI_URLscheme (falsewhenhttps://)ARG NEMOCLAW_DISABLE_DEVICE_AUTH=0withENVpromotion (follows existing C-2 safe pattern — no ARG interpolation into Python source)scripts/nemoclaw-start.sh(+12/-4)ALLOWED_CLIENTS = {'openclaw-control-ui'}andALLOWED_MODES = {'webchat'}isinstance(device, dict)before field access[auto-pair] rejected unknown client=... mode=...logclient=on approval for audit trailNEMOCLAW_DISABLE_DEVICE_AUTHin script headertest/security-c2-dockerfile-injection.test.js(+6 tests)TruefordangerouslyDisableDeviceAuthTrueforallowInsecureAuthARGdefaults to0ENVpromotion beforeRUNlayertest/nemoclaw-start.test.js(+7 tests)ALLOWED_CLIENTS,ALLOWED_MODES)NEMOCLAW_DISABLE_DEVICE_AUTHdocumentedWhy no Brev E2E test
These changes are deterministic config values and simple conditional logic, not runtime behavior with environmental ambiguity:
os.environ.get(...) == '1'andparsed.scheme != 'https'. If the source is correct, the output is correct.This is fundamentally different from something like the telegram bridge injection fix, where you must send real payloads through a real shell on a real sandbox to prove metacharacters don't get interpreted. Our changes have no gap between "looks right in source" and "works right at runtime." The 13 source-pattern tests provide sufficient confidence.
If deeper verification is desired in the future, a build-time unit test that runs the Dockerfile's Python snippet and asserts the JSON output would be the appropriate next step — a regular CI test, not a Brev instance.
Test plan
shellcheck✅hadolint✅shfmt/prettier✅gitleaks✅NEMOCLAW_DISABLE_DEVICE_AUTH=1build: device auth is disabled (backward-compatible escape hatch)CHAT_UI_URL=https://...build:allowInsecureAuthisfalseCHAT_UI_URL=http://127.0.0.1:18789(default):allowInsecureAuthistrueopenclaw-control-ui/webchatdevices: auto-pair approves as beforeSummary by CodeRabbit