Skip to content

fix(sandbox): export HTTP_PROXY/HTTPS_PROXY/NO_PROXY so Node.js routes through OpenShell egress proxy (fixes #626)#704

Closed
nanookclaw wants to merge 4 commits intoNVIDIA:mainfrom
nanookclaw:fix/sandbox-dns-resolution-proxy-env
Closed

fix(sandbox): export HTTP_PROXY/HTTPS_PROXY/NO_PROXY so Node.js routes through OpenShell egress proxy (fixes #626)#704
nanookclaw wants to merge 4 commits intoNVIDIA:mainfrom
nanookclaw:fix/sandbox-dns-resolution-proxy-env

Conversation

@nanookclaw
Copy link
Copy Markdown

@nanookclaw nanookclaw commented Mar 23, 2026

Problem

Web search tools (Brave, Gemini, Perplexity) fail inside the NemoClaw sandbox with getaddrinfo EAI_AGAIN regardless of what is added to the network policy. This was reported in #626.

Root cause: Node.js (undici/node:fetch) resolves DNS locally before opening the CONNECT tunnel to the OpenShell egress proxy at 10.200.0.1:3128. The sandbox network namespace has no DNS resolver configured, so local resolution always fails.

curl does not have this problem — it sends the full hostname to the proxy and lets the proxy resolve DNS. Node.js requires the HTTP_PROXY / HTTPS_PROXY env var to do the same.

Fix

Export HTTP_PROXY, HTTPS_PROXY, and NO_PROXY in nemoclaw-start.sh before openclaw gateway is launched. This causes Node.js/undici to route all outbound HTTPS through the OpenShell proxy, which handles DNS resolution — matching the behaviour users already see with curl.

PROXY_HOST="${NEMOCLAW_PROXY_HOST:-10.200.0.1}"
PROXY_PORT="${NEMOCLAW_PROXY_PORT:-3128}"
export HTTP_PROXY="http://${PROXY_HOST}:${PROXY_PORT}"
export HTTPS_PROXY="http://${PROXY_HOST}:${PROXY_PORT}"
export NO_PROXY="localhost,127.0.0.1,::1,inference.local,10.200.0.0/16"

NO_PROXY excludes loopback, inference.local, and the 10.200.0.0/16 OpenShell virtual network so internal gateway calls stay on the fast path and are not routed through the proxy.

Overrides: NEMOCLAW_PROXY_HOST and NEMOCLAW_PROXY_PORT allow users to adjust the proxy address if the OpenShell gateway IP changes in a future release.

Tests

Added 6 tests to test/service-env.test.js verifying:

  • Default proxy URLs point at 10.200.0.1:3128
  • NEMOCLAW_PROXY_HOST override works
  • NEMOCLAW_PROXY_PORT override works
  • NO_PROXY excludes loopback and inference.local
  • NO_PROXY excludes the OpenShell virtual network range

All 17 tests in service-env.test.js pass. The one pre-existing failure in install-preflight.test.js (can run via stdin without a sibling runtime.sh file) is present on main before this change and is unrelated.

Closes #626

Summary by CodeRabbit

  • New Features

    • Runtime proxy support with configurable host/port; default proxy is 10.200.0.1:3128. NO_PROXY bypasses localhost, 127.0.0.1, ::1, inference.local, and the gateway IP 10.200.0.1.
    • Proxy settings are exported at startup and, when possible, persisted for login shells so interactive sessions inherit them.
  • Tests

    • Added tests covering proxy defaults, host/port overrides, and creation of the login-shell proxy snippet.

…resolution

Web search tools (Brave, Gemini, Perplexity) failed inside the NemoClaw
sandbox with `getaddrinfo EAI_AGAIN` because Node.js (undici/fetch) resolves
DNS locally before opening the CONNECT tunnel to the OpenShell egress proxy
at 10.200.0.1:3128.  The sandbox network namespace has no DNS resolver, so
local resolution always fails regardless of what is added to the network policy.

curl does not have this problem because it sends the full hostname to the
proxy and lets the proxy resolve it.  Node.js's undici requires an
HTTP_PROXY / HTTPS_PROXY env var to do the same.

Fix: export HTTP_PROXY, HTTPS_PROXY, and NO_PROXY in nemoclaw-start.sh
before openclaw gateway is launched.  This causes Node.js to route all
outbound HTTPS through the OpenShell proxy (which handles DNS), matching
the behaviour users already see with curl.

NO_PROXY excludes loopback, inference.local, and the 10.200.0.0/16 OpenShell
virtual network so internal gateway calls stay on the fast path.

Both the proxy host and port can be overridden via NEMOCLAW_PROXY_HOST and
NEMOCLAW_PROXY_PORT for users running custom OpenShell gateway configurations.

Closes NVIDIA#626
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 23, 2026

📝 Walkthrough

Walkthrough

Adds outbound proxy configuration to scripts/nemoclaw-start.sh: computes PROXY_HOST/PROXY_PORT from NEMOCLAW_PROXY_HOST/NEMOCLAW_PROXY_PORT, exports HTTP_PROXY, HTTPS_PROXY, and NO_PROXY, and conditionally writes /etc/profile.d/nemoclaw-proxy.sh. Adds tests that extract and validate the proxy snippet's produced environment variables and profile.d writer behavior.

Changes

Cohort / File(s) Summary
Startup script
scripts/nemoclaw-start.sh
Add computation of PROXY_HOST/PROXY_PORT from NEMOCLAW_PROXY_HOST/NEMOCLAW_PROXY_PORT, export HTTP_PROXY/HTTPS_PROXY/NO_PROXY, and conditionally write /etc/profile.d/nemoclaw-proxy.sh to re-export proxies on login. Also minor whitespace/log-redirection tweaks and adjusted multi-line token capture.
Tests
test/service-env.test.js
New test suite "proxy environment variables (issue #626)" that extracts the proxy block from the start script, runs it in a temporary wrapper with controlled env, and asserts default and overridden HTTP_PROXY/HTTPS_PROXY values and NO_PROXY contents (includes localhost, 127.0.0.1, ::1, inference.local, and the gateway IP 10.200.0.1). Also tests writing of the nemoclaw-proxy.sh snippet into a profile.d-like directory.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through shells and set the trail,
Exports in place to guide each sail,
A tiny script to show the way,
Tests to prove we won the day,
🥕 Proxy paths—now off I play!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the main change: exporting proxy environment variables so Node.js routes through OpenShell proxy, and correctly references the issue being fixed.
Linked Issues check ✅ Passed The PR implements the chosen solution from issue #626 by exporting HTTP_PROXY/HTTPS_PROXY/NO_PROXY environment variables and adding /etc/profile.d script writer to ensure Node.js routes through the OpenShell proxy.
Out of Scope Changes check ✅ Passed All changes are in-scope: proxy environment variable setup in nemoclaw-start.sh and comprehensive tests in service-env.test.js directly address the issue #626 requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/service-env.test.js (1)

107-136: Test duplicates script logic rather than sourcing the actual implementation.

The extractProxyVars helper embeds a copy of the proxy variable logic from nemoclaw-start.sh. If the shell script is modified (e.g., different defaults, additional proxy vars), these tests will still pass but won't catch the regression.

Consider sourcing or extracting the relevant lines directly from scripts/nemoclaw-start.sh to ensure the test validates the actual implementation. Alternatively, add a comment noting this intentional duplication so future maintainers remember to update both locations.

🔧 Option: Add a warning comment about the duplication
     function extractProxyVars(env = {}) {
-      // Write the proxy-variable snippet from nemoclaw-start.sh to a temp script
-      // and execute it so that bash variable assignments and expansions work
-      // correctly without interference from JSON.stringify quote-escaping.
+      // IMPORTANT: This script mirrors the proxy-variable logic from
+      // scripts/nemoclaw-start.sh. If you update the proxy configuration
+      // there, update this test helper as well to keep them in sync.
       const script = [
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/service-env.test.js` around lines 107 - 136, The test helper
extractProxyVars duplicates the proxy logic instead of sourcing
scripts/nemoclaw-start.sh; update extractProxyVars to read and source the real
script (or extract the proxy-related lines) so tests reflect the actual
implementation—e.g., have extractProxyVars locate scripts/nemoclaw-start.sh,
write a short temp wrapper that sources that file and echoes
HTTP_PROXY/HTTPS_PROXY/NO_PROXY, then execFileSync that wrapper with the
provided env; if you prefer not to change behavior, add a clear comment above
extractProxyVars stating it intentionally duplicates nemoclaw-start.sh and must
be kept in sync with that file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 148-154: The NO_PROXY value currently contains a CIDR entry
("10.200.0.0/16") which EnvHttpProxyAgent (undici) won't match; update the
NO_PROXY assignment to remove the CIDR or replace it with explicit hosts/IPs
that should bypass the proxy (e.g., list the gateway host 10.200.0.1 and any
other specific IPs or hostnames used on the virtual network), ensuring the
environment variable set in the scripts/nemoclaw-start.sh (NO_PROXY) only
contains exact hostnames, domain suffixes, or individual IPs so
EnvHttpProxyAgent will correctly bypass the proxy for internal addresses
referenced by code that uses EnvHttpProxyAgent.

---

Nitpick comments:
In `@test/service-env.test.js`:
- Around line 107-136: The test helper extractProxyVars duplicates the proxy
logic instead of sourcing scripts/nemoclaw-start.sh; update extractProxyVars to
read and source the real script (or extract the proxy-related lines) so tests
reflect the actual implementation—e.g., have extractProxyVars locate
scripts/nemoclaw-start.sh, write a short temp wrapper that sources that file and
echoes HTTP_PROXY/HTTPS_PROXY/NO_PROXY, then execFileSync that wrapper with the
provided env; if you prefer not to change behavior, add a clear comment above
extractProxyVars stating it intentionally duplicates nemoclaw-start.sh and must
be kept in sync with that file.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a807432e-cbfe-498e-b1d1-8f0961099749

📥 Commits

Reviewing files that changed from the base of the PR and between c55a309 and 88e573a.

📒 Files selected for processing (2)
  • scripts/nemoclaw-start.sh
  • test/service-env.test.js

…plicating

extractProxyVars now uses sed to extract the PROXY_HOST/PROXY_PORT/HTTP_PROXY/
HTTPS_PROXY/NO_PROXY block directly from scripts/nemoclaw-start.sh and runs it
in a minimal bash wrapper.

This ensures that if the proxy configuration in nemoclaw-start.sh changes
(different defaults, new variables, different NO_PROXY entries), the tests
will catch the regression rather than passing against stale duplicated logic.

All 17 tests pass.

Addresses CodeRabbit nitpick in PR NVIDIA#704.
@nanookclaw
Copy link
Copy Markdown
Author

Good catch from CodeRabbit — addressed in 623d878.

extractProxyVars now sources the proxy block directly from scripts/nemoclaw-start.sh using sed extraction rather than duplicating the logic. The wrapper runs the actual shell lines and echoes the three variables. If the script's defaults or variable names change, the tests will catch the regression.

All 17 tests still pass.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/service-env.test.js`:
- Around line 167-173: The test "NO_PROXY excludes loopback and inference.local"
is missing an assertion for the IPv6 loopback; update the test that calls
extractProxyVars() and checks vars.NO_PROXY to also assert the presence of "::1"
(e.g., add expect(noProxy).toContain("::1")). Keep the existing logic that
splits NO_PROXY and retains assertions for "localhost", "127.0.0.1", and
"inference.local" so the test verifies all expected loopback entries.
- Around line 116-121: The sed extraction into proxyBlock (via execFileSync with
scriptPath) may return an empty string if the pattern doesn't match; add a guard
after obtaining proxyBlock that checks proxyBlock.trim() and throws a clear
Error like "Failed to extract proxy configuration from nemoclaw-start.sh" so
tests fail fast and with a descriptive message; update the test around the
execFileSync call (proxyBlock variable) to validate extraction before using the
proxy environment in subsequent steps.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 21690e2b-003a-44d8-86c7-ed3e2563a4ca

📥 Commits

Reviewing files that changed from the base of the PR and between 88e573a and 623d878.

📒 Files selected for processing (1)
  • test/service-env.test.js

…::1 test

- Replace 10.200.0.0/16 (CIDR) with 10.200.0.1 (explicit gateway IP).
  Node.js undici EnvHttpProxyAgent matches NO_PROXY entries as exact
  hostname strings or domain suffixes, not CIDR ranges.  The CIDR entry
  was silently ignored, routing all virtual-network IPs through the
  proxy when they should be bypassed.
- Add test guard: extractProxyVars() now throws if the sed extraction
  returns an empty block, surfacing script-format regressions as clear
  failures instead of silent empty-env errors.
- Add ::1 (IPv6 loopback) assertion to the loopback exclusion test to
  match the NO_PROXY entry already present in the script.

All 17 tests pass.

Addresses CodeRabbit review comments on PR NVIDIA#704.
@nanookclaw
Copy link
Copy Markdown
Author

Addressed the three CodeRabbit review comments (c81cb4f):

  1. CIDR in NO_PROXY (Major): Replaced 10.200.0.0/16 with 10.200.0.1. Node.js undici EnvHttpProxyAgent matches NO_PROXY entries as exact hostnames/domain suffixes, not CIDR ranges — the CIDR entry was silently treated as a literal hostname string and matched nothing.

  2. Empty proxyBlock guard (Minor): extractProxyVars() now throws with a clear message if the sed extraction returns empty, so script-format regressions surface immediately instead of producing silent empty-env failures.

  3. ::1 test coverage (Minor): Added assertion for IPv6 loopback ::1 in the loopback exclusion test, matching the entry already present in the script.

All 17 tests pass.

@wscurran wscurran added bug Something isn't working enhancement New feature or request OpenShell Support for OpenShell, a safe, private runtime for autonomous AI agents labels Mar 23, 2026
@kjw3 kjw3 self-assigned this Mar 23, 2026
@kjw3
Copy link
Copy Markdown
Contributor

kjw3 commented Mar 24, 2026

Retested this on a scrubbed Linux VM to make sure the earlier result was not test error. The outcome is the same.

What we verified:

  • the PR shell block itself behaves correctly on the host
  • defaults resolve to:
    • HTTP_PROXY=http://10.200.0.1:3128
    • HTTPS_PROXY=http://10.200.0.1:3128
    • NO_PROXY=localhost,127.0.0.1,::1,inference.local,10.200.0.1
  • host/port overrides also work correctly for HTTP_PROXY / HTTPS_PROXY
  • after nemoclaw onboard and nemoclaw my-assistant connect, inside the live sandbox runtime we still see:
    • HTTP_PROXY=http://10.200.0.1:3128
    • HTTPS_PROXY=http://10.200.0.1:3128
    • NO_PROXY=127.0.0.1,localhost,::1
  • Node no longer fails with getaddrinfo EAI_AGAIN; it reaches the proxy and fails later with Proxy response (403) !== 200 when HTTP Tunneling, which is good evidence that the proxy routing part is working

So the remaining blocker is specifically NO_PROXY propagation:

  • the startup script exports inference.local,10.200.0.1
  • but those entries are missing in the actual sandbox runtime env

Please follow up with a fix or explanation for that mismatch. The re-review condition is that inside the running sandbox, env shows the full expected NO_PROXY value from the script, not the truncated loopback-only value.

Once that is fixed, I’m happy to re-review.

…nherit full NO_PROXY

OpenShell injects NO_PROXY=127.0.0.1,localhost,::1 when a user connects to
the sandbox via `openshell sandbox connect`, overwriting the value set
by nemoclaw-start.sh.  The gateway process (launched by nemoclaw-start.sh)
inherits the correct NO_PROXY, but subsequent interactive shell sessions
opened via `connect` get the truncated loopback-only value — exactly the
mismatch reported by kjw3 in the re-review.

Fix: after exporting the proxy vars, write a /etc/profile.d/nemoclaw-proxy.sh
snippet that restores the full NO_PROXY on every login shell.  This snippet
is sourced by bash/sh after OpenShell's injection, so connected sessions
always see inference.local and 10.200.0.1 in NO_PROXY.

Added test: verifies that nemoclaw-start.sh writes a profile.d snippet
containing all expected NO_PROXY entries (inference.local, 10.200.0.1).

All 18 tests pass.
@nanookclaw
Copy link
Copy Markdown
Author

Thanks for the detailed re-test, @kjw3 — this pinpoints exactly what's happening.

Root cause: OpenShell injects NO_PROXY=127.0.0.1,localhost,::1 when establishing the sandbox connect session. This overwrites the value set in nemoclaw-start.sh for the gateway process, but any shell opened via connect after that point gets the truncated loopback-only value.

Fix (b820c9c): After exporting the proxy vars in nemoclaw-start.sh, write /etc/profile.d/nemoclaw-proxy.sh with the full NO_PROXY value. /etc/profile.d/ snippets are sourced by bash after OpenShell's injection on every login shell — so connected sessions see inference.local and 10.200.0.1 in NO_PROXY.

The write is guarded by [ -d /etc/profile.d ] so it's a no-op in environments where the directory doesn't exist.

Test added: writes proxy snippet to a profile.d directory when it exists — verifies the snippet is created with all expected entries. All 18 tests pass.

If you can re-test with the updated branch, the expected sandbox env output should now show:

HTTP_PROXY=http://10.200.0.1:3128
HTTPS_PROXY=http://10.200.0.1:3128
NO_PROXY=localhost,127.0.0.1,::1,inference.local,10.200.0.1

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/nemoclaw-start.sh`:
- Line 154: The NO_PROXY export currently hardcodes 10.200.0.1 which breaks
custom proxy host overrides; update the NO_PROXY value to include the
configurable variable (use ${PROXY_HOST} or ${NEMOCLAW_PROXY_HOST} consistent
with how HTTP_PROXY/HTTPS_PROXY are set) so the proxy host is bypassed
correctly, and make the same replacement inside the heredoc where the hardcoded
10.200.0.1 is used; ensure you reference the NO_PROXY export and the heredoc
occurrence when making the edits.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2dbd4761-0e80-4739-abd3-3990ec96f07a

📥 Commits

Reviewing files that changed from the base of the PR and between 623d878 and b820c9c.

📒 Files selected for processing (2)
  • scripts/nemoclaw-start.sh
  • test/service-env.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/service-env.test.js

export HTTPS_PROXY="http://${PROXY_HOST}:${PROXY_PORT}"
# Bypass proxy for loopback, sandbox-local, and the OpenShell virtual network
# so internal gateway calls (openclaw dashboard, inference.local) stay fast.
export NO_PROXY="localhost,127.0.0.1,::1,inference.local,10.200.0.1"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

NO_PROXY hardcodes 10.200.0.1 instead of using ${PROXY_HOST}.

Lines 154 and 166 hardcode 10.200.0.1 in NO_PROXY, while HTTP_PROXY and HTTPS_PROXY use the configurable ${PROXY_HOST}. If a user overrides NEMOCLAW_PROXY_HOST to a different IP, the proxy host won't be bypassed, potentially causing requests to the proxy to be routed through itself.

🐛 Proposed fix for consistency
-export NO_PROXY="localhost,127.0.0.1,::1,inference.local,10.200.0.1"
+export NO_PROXY="localhost,127.0.0.1,::1,inference.local,${PROXY_HOST}"

And in the heredoc (line 166):

-export NO_PROXY="localhost,127.0.0.1,::1,inference.local,10.200.0.1"
+export NO_PROXY="localhost,127.0.0.1,::1,inference.local,${PROXY_HOST}"

Also applies to: 166-166

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/nemoclaw-start.sh` at line 154, The NO_PROXY export currently
hardcodes 10.200.0.1 which breaks custom proxy host overrides; update the
NO_PROXY value to include the configurable variable (use ${PROXY_HOST} or
${NEMOCLAW_PROXY_HOST} consistent with how HTTP_PROXY/HTTPS_PROXY are set) so
the proxy host is bypassed correctly, and make the same replacement inside the
heredoc where the hardcoded 10.200.0.1 is used; ensure you reference the
NO_PROXY export and the heredoc occurrence when making the edits.

senthilr-nv added a commit to senthilr-nv/NemoClaw that referenced this pull request Mar 27, 2026
…ross reconnects

OpenShell injects NO_PROXY=127.0.0.1,localhost,::1 into the sandbox, missing
inference.local and the gateway IP (10.200.0.1). This causes LLM inference
requests to route through the egress proxy instead of going direct, and the
proxy gateway IP itself gets proxied.

Add proxy configuration block to nemoclaw-start.sh that:
- Exports HTTP_PROXY, HTTPS_PROXY, and NO_PROXY with inference.local and
  the gateway IP included
- Persists via /etc/profile.d/nemoclaw-proxy.sh (root) or ~/.profile
  (non-root fallback) so values survive OpenShell reconnect injection
- Supports NEMOCLAW_PROXY_HOST / NEMOCLAW_PROXY_PORT overrides

The non-root fallback ensures the fix works in environments like Brev where
containers run without root privileges.

Tested on DGX Spark (ARM64) and Brev VM (x86_64). Verified NO_PROXY contains
inference.local and 10.200.0.1 inside the live sandbox after connect.

Ref: NVIDIA#626, NVIDIA#704
Ref: NVIDIA#704 (comment)
senthilr-nv added a commit to senthilr-nv/NemoClaw that referenced this pull request Mar 27, 2026
…ross reconnects

OpenShell injects NO_PROXY=127.0.0.1,localhost,::1 into the sandbox, missing
inference.local and the gateway IP (10.200.0.1). This causes LLM inference
requests to route through the egress proxy instead of going direct, and the
proxy gateway IP itself gets proxied.

Add proxy configuration block to nemoclaw-start.sh that:
- Exports HTTP_PROXY, HTTPS_PROXY, and NO_PROXY with inference.local and
  the gateway IP included
- Persists via /etc/profile.d/nemoclaw-proxy.sh (root) or ~/.profile
  (non-root fallback) so values survive OpenShell reconnect injection
- Supports NEMOCLAW_PROXY_HOST / NEMOCLAW_PROXY_PORT overrides

The non-root fallback ensures the fix works in environments like Brev where
containers run without root privileges.

Tested on DGX Spark (ARM64) and Brev VM (x86_64). Verified NO_PROXY contains
inference.local and 10.200.0.1 inside the live sandbox after connect.

Ref: NVIDIA#626, NVIDIA#704
Ref: NVIDIA#704 (comment)
kjw3 pushed a commit that referenced this pull request Mar 27, 2026
…ross reconnects (#1025)

* fix(sandbox): export proxy env vars with full NO_PROXY and persist across reconnects

OpenShell injects NO_PROXY=127.0.0.1,localhost,::1 into the sandbox, missing
inference.local and the gateway IP (10.200.0.1). This causes LLM inference
requests to route through the egress proxy instead of going direct, and the
proxy gateway IP itself gets proxied.

Add proxy configuration block to nemoclaw-start.sh that:
- Exports HTTP_PROXY, HTTPS_PROXY, and NO_PROXY with inference.local and
  the gateway IP included
- Persists via /etc/profile.d/nemoclaw-proxy.sh (root) or ~/.profile
  (non-root fallback) so values survive OpenShell reconnect injection
- Supports NEMOCLAW_PROXY_HOST / NEMOCLAW_PROXY_PORT overrides

The non-root fallback ensures the fix works in environments like Brev where
containers run without root privileges.

Tested on DGX Spark (ARM64) and Brev VM (x86_64). Verified NO_PROXY contains
inference.local and 10.200.0.1 inside the live sandbox after connect.

Ref: #626, #704
Ref: #704 (comment)

* fix(sandbox): write proxy config to ~/.bashrc for interactive reconnect sessions

OpenShell's `sandbox connect` spawns `/bin/bash -i` (interactive, non-login),
which sources ~/.bashrc — not ~/.profile or /etc/profile.d/*.  The previous
approach wrote to ~/.profile and /etc/profile.d/, neither of which is sourced
by `bash -i`, so the narrow OpenShell-injected NO_PROXY persisted in live
interactive sessions.

Changes:
- Write proxy snippet to ~/.bashrc (primary) and ~/.profile (login fallback)
- Export both uppercase and lowercase proxy variants (NO_PROXY + no_proxy,
  HTTP_PROXY + http_proxy, etc.) — Node.js undici prefers lowercase no_proxy
  over uppercase NO_PROXY when both are set
- Add idempotency guard to prevent duplicate blocks on container restart
- Update tests: verify .bashrc writing, idempotency, bash -i override
  behavior, and lowercase variant correctness

Tested on DGX Spark (ARM64) and Brev VM (x86_64) with full destroy +
re-onboard + live `env | grep proxy` verification inside the sandbox shell
via `openshell sandbox connect`.

Ref: #626

* fix(sandbox): replace stale proxy values on restart with begin/end markers

Use begin/end markers in .bashrc/.profile proxy snippet so
_write_proxy_snippet replaces the block when PROXY_HOST/PORT change
instead of silently keeping stale values. Adds test coverage for the
replacement path.

Addresses CodeRabbit review feedback on idempotency gap.

* fix(sandbox): resolve sandbox user home dynamically when running as root

When the entrypoint runs as root, $HOME is /root — the proxy snippet
was written to /root/.bashrc instead of the sandbox user's home.
Use getent passwd to look up the sandbox user's home when running as
UID 0; fall back to /sandbox if the user entry is missing.

Addresses CodeRabbit review feedback on _SANDBOX_HOME resolution.

---------

Co-authored-by: Carlos Villela <cvillela@nvidia.com>
@kjw3
Copy link
Copy Markdown
Contributor

kjw3 commented Mar 27, 2026

Thanks for pushing the original direction here.

The later follow-up work on #1025 confirmed that your core diagnosis was right:

  • the proxy env needs to be exported in nemoclaw-start.sh
  • NO_PROXY needs the full bypass list, including inference.local and the gateway IP
  • the real fix has to account for what happens again on reconnect, not just initial container startup

The part that turned out to need more iteration was the persistence/reconnect mechanism, not the direction of the fix itself.

So thank you for getting the problem framed correctly and moving this line of work forward.

@kjw3
Copy link
Copy Markdown
Contributor

kjw3 commented Mar 27, 2026

Closing this as superseded by the later follow-through in #1025. The original direction here was correct and materially guided the fix line, but the reconnect/persistence mechanism needed more iteration before it worked end to end.

@kjw3 kjw3 closed this Mar 27, 2026
TSavo pushed a commit to wopr-network/nemoclaw that referenced this pull request Mar 28, 2026
* fix: improve gateway lifecycle recovery (NVIDIA#953)

* fix: improve gateway lifecycle recovery

* docs: fix readme markdown list spacing

* fix: tighten gateway lifecycle review follow-ups

* fix: simplify tokenized control ui output

* fix: restore chat route in control ui urls

* refactor: simplify ansi stripping in onboard

* fix: shorten control ui url output

* fix: move control ui below cli next steps

* fix: swap hard/soft ulimit settings in start script (NVIDIA#951)

Fixes NVIDIA#949

Co-authored-by: KJ <kejones@nvidia.com>

* chore: add cyclomatic complexity lint rule (NVIDIA#875)

* chore: add cyclomatic complexity rule (ratchet from 95)

Add ESLint complexity rule to bin/ and scripts/ to prevent new
functions from accumulating excessive branching. Starting threshold
is 95 (current worst offender: setupNim in onboard.js). Ratchet
plan: 95 → 40 → 25 → 15.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: ratchet complexity to 20, suppress existing violations

Suppress 6 functions that exceed the threshold with eslint-disable
comments so we can start enforcing at 20 instead of 95:

- setupNim (95), setupPolicies (41), setupInference (22) in onboard.js
- deploy (22), main IIFE (27) in nemoclaw.js
- applyPreset (24) in policies.js

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: suppress complexity for 3 missed functions

preflight (23), getReconciledSandboxGatewayState (25), sandboxStatus (27)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add host-side config and state file locations to README (NVIDIA#903)

Signed-off-by: peteryuqin <peter.yuqin@gmail.com>

* chore: add tsconfig.cli.json, root execa, TS coverage ratchet (NVIDIA#913)

* chore: add tsconfig.cli.json, root execa, TS coverage ratchet

Foundation for the CLI TypeScript migration (PR 0 of the shell
consolidation plan). No runtime changes — config, tooling, and
dependency only.

- tsconfig.cli.json: strict TS type-checking for bin/ and scripts/
  (noEmit, module: preserve — tsx handles the runtime)
- scripts/check-coverage-ratchet.ts: pure TS replacement for the
  bash+python coverage ratchet script (same logic, same tolerance)
- execa ^9.6.1 added to root devDependencies (used by PR 1+)
- pr.yaml: coverage ratchet step now runs the TS version via tsx
- .pre-commit-config.yaml: SPDX headers cover scripts/*.ts,
  new tsc-check-cli pre-push hook
- CONTRIBUTING.md: document typecheck:cli task and CLI pre-push hook
- Delete scripts/check-coverage-ratchet.sh

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Apply suggestion from @brandonpelfrey

* chore: address PR feedback — use types_or, add tsx devDep

- Use `types_or: [ts, tsx]` instead of file glob for tsc-check-cli
  hook per @brandonpelfrey's suggestion.
- Add `tsx` to devDependencies so CI doesn't re-fetch it on every run
  per CodeRabbit's suggestion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): ignore GitHub "Apply suggestion" commits in commitlint

* fix(ci): lint only PR title since repo is squash-merge only

Reverts the commitlint ignores rule from the previous commit and
instead removes the per-commit lint step entirely.

Individual commit messages are discarded at merge time — only the
squash-merged PR title lands in main and drives changelog generation.
Drop the per-commit lint, keep the PR title check, and remove the
now-unnecessary fetch-depth: 0.

* Revert "fix(ci): lint only PR title since repo is squash-merge only"

This reverts commit 1257a47.

* Revert "fix(ci): ignore GitHub "Apply suggestion" commits in commitlint"

This reverts commit c395657.

* docs: fix markdownlint MD032 in README (blank line before list)

* refactor: make coverage ratchet script idiomatic TypeScript

- Wrap in main() with process.exitCode instead of scattered process.exit()
- Replace mutable flags with .map()/.some() over typed MetricResult[]
- Separate pure logic (checkMetrics) from formatting (formatReport)
- Throw with { cause } chaining instead of exit-in-helpers
- Derive CoverageThresholds from METRICS tuple (single source of truth)
- Exhaustive switch on CheckStatus discriminated union

* refactor: remove duplication in coverage ratchet script

- Drop STATUS_LABELS map; inline labels in exhaustive switch
- Extract common 'metric coverage is N%' preamble in formatResult
- Simplify ratchetedThresholds: use results directly (already in
  METRICS order) instead of re-scanning with .find() per metric
- Compute 'failed' once in main, pass into formatReport to avoid
  duplicate .some() scan

* refactor: simplify coverage ratchet with FP patterns

- Extract classify() as a named pure function (replaces nested ternary)
- loadJSON takes repo-relative paths, eliminating THRESHOLD_PATH and
  SUMMARY_PATH constants (DRY the join-with-REPO_ROOT pattern)
- Drop CoverageMetric/CoverageSummary interfaces (only pct is read);
  use structural type at the call site instead
- Inline ratchetedThresholds (one-liner, used once)
- formatReport derives fail/improved from results instead of taking
  a pre-computed boolean (let functions derive from data, don't
  thread derived state)
- sections.join("\n\n") replaces manual empty-string pushing
- Shorter type names (Thresholds, Status, Result) — no ambiguity
  in a single-purpose script

* refactor: strip coverage ratchet to failure-only output

prek hides output from commands that exit 0, so ok/improved
reporting was dead code. Remove Status, Result, classify,
formatResult, formatReport, and the ratcheted-thresholds
suggestion block. The script now just filters for regressions
and prints actionable errors on failure.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Brandon Pelfrey <bpelfrey@nvidia.com>

* fix: use CONNECT tunnel for WebSocket endpoints in Discord/Slack presets (NVIDIA#438)

* fix: use CONNECT tunnel for WebSocket endpoints in Discord/Slack presets

The egress proxy's HTTP idle timeout (~2 min) kills long-lived WebSocket
connections when endpoints are configured with protocol:rest + tls:terminate.
Switch WebSocket endpoints to access:full (CONNECT tunnel) which bypasses
HTTP-level timeouts entirely.

Discord:
- gateway.discord.gg → access:full (WebSocket gateway)
- Add PUT/PATCH/DELETE methods for discord.com (message editing, reactions)
- Add media.discordapp.net for attachment access

Slack:
- Add wss-primary.slack.com and wss-backup.slack.com → access:full
  (Socket Mode WebSocket endpoints)

Partially addresses NVIDIA#409 — the policy-level fix enables WebSocket
connections to survive. The hardcoded 2-min timeout in openshell-sandbox
still affects any protocol:rest endpoints with long-lived connections.

Related: NVIDIA#361 (WhatsApp Web, same root cause)

* fix: correct comment wording for media endpoint and YAML formatting

* fix: standardize Node.js minimum version to 22.16 (NVIDIA#840)

* fix: remove unused RECOMMENDED_NODE_MAJOR from scripts/install.sh

Shellcheck flagged it as unused after the min/recommended merge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: enforce full semver >=22.16.0 in installer scripts

The runtime checks only compared the major Node.js version, allowing
22.0–22.15 to pass despite package.json requiring >=22.16.0. Use the
version_gte() helper for full semver comparison in both installers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: harden version_gte and align fallback message

Guard version_gte() against prerelease suffixes (e.g. "22.16.0-rc.1")
that would crash bash arithmetic. Also update the manual-install
fallback message to reference MIN_NODE_VERSION instead of hardcoded "22".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update test stubs for Node.js 22.16 minimum and add Node 20 rejection test

- Bump node stub in 'succeeds with acceptable Node.js' from v20.0.0 to v22.16.0
- Bump node stub in buildCurlPipeEnv from v22.14.0 to v22.16.0
- Add new test asserting Node.js 20 is rejected by ensure_supported_runtime

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: harden installer and onboard resiliency (NVIDIA#961)

* fix: harden installer and onboard resiliency

* fix: address installer and debug review follow-ups

* fix: harden onboard resume across later setup steps

* test: simplify payload extraction in onboard tests

* test: keep onboard payload extraction target-compatible

* chore: align onboard session lint with complexity rule

* fix: harden onboard session safety and lock handling

* fix: tighten onboard session redaction and metadata handling

* fix(security): strip credentials from migration snapshots and enforce blueprint digest (NVIDIA#769)

Reconciles NVIDIA#156 and NVIDIA#743 into a single comprehensive solution:

- Filter auth-profiles.json at copy time via cpSync filter (from NVIDIA#743)
- Recursive stripCredentials() with pattern-based field detection for
  deep config sanitization (from NVIDIA#156: CREDENTIAL_FIELDS set +
  CREDENTIAL_FIELD_PATTERN regex)
- Remove gateway config section (contains auth tokens) from sandbox
  openclaw.json
- Blueprint digest verification (SHA-256): recorded at snapshot time,
  validated on restore, empty/missing digest is a hard failure
- computeFileDigest() throws when blueprint file is missing instead of
  silently returning null
- Sanitize both snapshot-level and sandbox-bundle openclaw.json copies
- Backward compatible: old snapshots without blueprintDigest skip
  validation
- Bump SNAPSHOT_VERSION 2 → 3

Supersedes NVIDIA#156 and NVIDIA#743.

* fix(sandbox): export proxy env vars with full NO_PROXY and persist across reconnects (NVIDIA#1025)

* fix(sandbox): export proxy env vars with full NO_PROXY and persist across reconnects

OpenShell injects NO_PROXY=127.0.0.1,localhost,::1 into the sandbox, missing
inference.local and the gateway IP (10.200.0.1). This causes LLM inference
requests to route through the egress proxy instead of going direct, and the
proxy gateway IP itself gets proxied.

Add proxy configuration block to nemoclaw-start.sh that:
- Exports HTTP_PROXY, HTTPS_PROXY, and NO_PROXY with inference.local and
  the gateway IP included
- Persists via /etc/profile.d/nemoclaw-proxy.sh (root) or ~/.profile
  (non-root fallback) so values survive OpenShell reconnect injection
- Supports NEMOCLAW_PROXY_HOST / NEMOCLAW_PROXY_PORT overrides

The non-root fallback ensures the fix works in environments like Brev where
containers run without root privileges.

Tested on DGX Spark (ARM64) and Brev VM (x86_64). Verified NO_PROXY contains
inference.local and 10.200.0.1 inside the live sandbox after connect.

Ref: NVIDIA#626, NVIDIA#704
Ref: NVIDIA#704 (comment)

* fix(sandbox): write proxy config to ~/.bashrc for interactive reconnect sessions

OpenShell's `sandbox connect` spawns `/bin/bash -i` (interactive, non-login),
which sources ~/.bashrc — not ~/.profile or /etc/profile.d/*.  The previous
approach wrote to ~/.profile and /etc/profile.d/, neither of which is sourced
by `bash -i`, so the narrow OpenShell-injected NO_PROXY persisted in live
interactive sessions.

Changes:
- Write proxy snippet to ~/.bashrc (primary) and ~/.profile (login fallback)
- Export both uppercase and lowercase proxy variants (NO_PROXY + no_proxy,
  HTTP_PROXY + http_proxy, etc.) — Node.js undici prefers lowercase no_proxy
  over uppercase NO_PROXY when both are set
- Add idempotency guard to prevent duplicate blocks on container restart
- Update tests: verify .bashrc writing, idempotency, bash -i override
  behavior, and lowercase variant correctness

Tested on DGX Spark (ARM64) and Brev VM (x86_64) with full destroy +
re-onboard + live `env | grep proxy` verification inside the sandbox shell
via `openshell sandbox connect`.

Ref: NVIDIA#626

* fix(sandbox): replace stale proxy values on restart with begin/end markers

Use begin/end markers in .bashrc/.profile proxy snippet so
_write_proxy_snippet replaces the block when PROXY_HOST/PORT change
instead of silently keeping stale values. Adds test coverage for the
replacement path.

Addresses CodeRabbit review feedback on idempotency gap.

* fix(sandbox): resolve sandbox user home dynamically when running as root

When the entrypoint runs as root, $HOME is /root — the proxy snippet
was written to /root/.bashrc instead of the sandbox user's home.
Use getent passwd to look up the sandbox user's home when running as
UID 0; fall back to /sandbox if the user entry is missing.

Addresses CodeRabbit review feedback on _SANDBOX_HOME resolution.

---------

Co-authored-by: Carlos Villela <cvillela@nvidia.com>

* fix(policies): preset application for versionless policies (Fixes NVIDIA#35) (NVIDIA#101)

* fix(policies): allow preset application for versionless policies (Fixes NVIDIA#35)

Fixes NVIDIA#35

Signed-off-by: Deepak Jain <deepujain@gmail.com>

* fix: remove stale complexity suppression in policies

---------

Signed-off-by: Deepak Jain <deepujain@gmail.com>
Co-authored-by: Kevin Jones <kejones@nvidia.com>

* fix: restore routed inference and connect UX (NVIDIA#1037)

* fix: restore routed inference and connect UX

* fix: simplify detected local inference hint

* fix: remove stale local inference hint

* test: relax connect forward assertion

---------

Signed-off-by: peteryuqin <peter.yuqin@gmail.com>
Signed-off-by: Deepak Jain <deepujain@gmail.com>
Co-authored-by: KJ <kejones@nvidia.com>
Co-authored-by: Emily Wilkins <80470879+epwilkins@users.noreply.github.com>
Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Peter <peter.yuqin@gmail.com>
Co-authored-by: Brandon Pelfrey <bpelfrey@nvidia.com>
Co-authored-by: Benedikt Schackenberg <6381261+BenediktSchackenberg@users.noreply.github.com>
Co-authored-by: Lucas Wang <lucas_wang@lucas-futures.com>
Co-authored-by: senthilr-nv <senthilr@nvidia.com>
Co-authored-by: Deepak Jain <deepujain@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
ksapru pushed a commit to ksapru/NemoClaw that referenced this pull request Mar 30, 2026
…ross reconnects (NVIDIA#1025)

* fix(sandbox): export proxy env vars with full NO_PROXY and persist across reconnects

OpenShell injects NO_PROXY=127.0.0.1,localhost,::1 into the sandbox, missing
inference.local and the gateway IP (10.200.0.1). This causes LLM inference
requests to route through the egress proxy instead of going direct, and the
proxy gateway IP itself gets proxied.

Add proxy configuration block to nemoclaw-start.sh that:
- Exports HTTP_PROXY, HTTPS_PROXY, and NO_PROXY with inference.local and
  the gateway IP included
- Persists via /etc/profile.d/nemoclaw-proxy.sh (root) or ~/.profile
  (non-root fallback) so values survive OpenShell reconnect injection
- Supports NEMOCLAW_PROXY_HOST / NEMOCLAW_PROXY_PORT overrides

The non-root fallback ensures the fix works in environments like Brev where
containers run without root privileges.

Tested on DGX Spark (ARM64) and Brev VM (x86_64). Verified NO_PROXY contains
inference.local and 10.200.0.1 inside the live sandbox after connect.

Ref: NVIDIA#626, NVIDIA#704
Ref: NVIDIA#704 (comment)

* fix(sandbox): write proxy config to ~/.bashrc for interactive reconnect sessions

OpenShell's `sandbox connect` spawns `/bin/bash -i` (interactive, non-login),
which sources ~/.bashrc — not ~/.profile or /etc/profile.d/*.  The previous
approach wrote to ~/.profile and /etc/profile.d/, neither of which is sourced
by `bash -i`, so the narrow OpenShell-injected NO_PROXY persisted in live
interactive sessions.

Changes:
- Write proxy snippet to ~/.bashrc (primary) and ~/.profile (login fallback)
- Export both uppercase and lowercase proxy variants (NO_PROXY + no_proxy,
  HTTP_PROXY + http_proxy, etc.) — Node.js undici prefers lowercase no_proxy
  over uppercase NO_PROXY when both are set
- Add idempotency guard to prevent duplicate blocks on container restart
- Update tests: verify .bashrc writing, idempotency, bash -i override
  behavior, and lowercase variant correctness

Tested on DGX Spark (ARM64) and Brev VM (x86_64) with full destroy +
re-onboard + live `env | grep proxy` verification inside the sandbox shell
via `openshell sandbox connect`.

Ref: NVIDIA#626

* fix(sandbox): replace stale proxy values on restart with begin/end markers

Use begin/end markers in .bashrc/.profile proxy snippet so
_write_proxy_snippet replaces the block when PROXY_HOST/PORT change
instead of silently keeping stale values. Adds test coverage for the
replacement path.

Addresses CodeRabbit review feedback on idempotency gap.

* fix(sandbox): resolve sandbox user home dynamically when running as root

When the entrypoint runs as root, $HOME is /root — the proxy snippet
was written to /root/.bashrc instead of the sandbox user's home.
Use getent passwd to look up the sandbox user's home when running as
UID 0; fall back to /sandbox if the user entry is missing.

Addresses CodeRabbit review feedback on _SANDBOX_HOME resolution.

---------

Co-authored-by: Carlos Villela <cvillela@nvidia.com>
laitingsheng pushed a commit that referenced this pull request Apr 2, 2026
…ross reconnects (#1025)

* fix(sandbox): export proxy env vars with full NO_PROXY and persist across reconnects

OpenShell injects NO_PROXY=127.0.0.1,localhost,::1 into the sandbox, missing
inference.local and the gateway IP (10.200.0.1). This causes LLM inference
requests to route through the egress proxy instead of going direct, and the
proxy gateway IP itself gets proxied.

Add proxy configuration block to nemoclaw-start.sh that:
- Exports HTTP_PROXY, HTTPS_PROXY, and NO_PROXY with inference.local and
  the gateway IP included
- Persists via /etc/profile.d/nemoclaw-proxy.sh (root) or ~/.profile
  (non-root fallback) so values survive OpenShell reconnect injection
- Supports NEMOCLAW_PROXY_HOST / NEMOCLAW_PROXY_PORT overrides

The non-root fallback ensures the fix works in environments like Brev where
containers run without root privileges.

Tested on DGX Spark (ARM64) and Brev VM (x86_64). Verified NO_PROXY contains
inference.local and 10.200.0.1 inside the live sandbox after connect.

Ref: #626, #704
Ref: #704 (comment)

* fix(sandbox): write proxy config to ~/.bashrc for interactive reconnect sessions

OpenShell's `sandbox connect` spawns `/bin/bash -i` (interactive, non-login),
which sources ~/.bashrc — not ~/.profile or /etc/profile.d/*.  The previous
approach wrote to ~/.profile and /etc/profile.d/, neither of which is sourced
by `bash -i`, so the narrow OpenShell-injected NO_PROXY persisted in live
interactive sessions.

Changes:
- Write proxy snippet to ~/.bashrc (primary) and ~/.profile (login fallback)
- Export both uppercase and lowercase proxy variants (NO_PROXY + no_proxy,
  HTTP_PROXY + http_proxy, etc.) — Node.js undici prefers lowercase no_proxy
  over uppercase NO_PROXY when both are set
- Add idempotency guard to prevent duplicate blocks on container restart
- Update tests: verify .bashrc writing, idempotency, bash -i override
  behavior, and lowercase variant correctness

Tested on DGX Spark (ARM64) and Brev VM (x86_64) with full destroy +
re-onboard + live `env | grep proxy` verification inside the sandbox shell
via `openshell sandbox connect`.

Ref: #626

* fix(sandbox): replace stale proxy values on restart with begin/end markers

Use begin/end markers in .bashrc/.profile proxy snippet so
_write_proxy_snippet replaces the block when PROXY_HOST/PORT change
instead of silently keeping stale values. Adds test coverage for the
replacement path.

Addresses CodeRabbit review feedback on idempotency gap.

* fix(sandbox): resolve sandbox user home dynamically when running as root

When the entrypoint runs as root, $HOME is /root — the proxy snippet
was written to /root/.bashrc instead of the sandbox user's home.
Use getent passwd to look up the sandbox user's home when running as
UID 0; fall back to /sandbox if the user entry is missing.

Addresses CodeRabbit review feedback on _SANDBOX_HOME resolution.

---------

Co-authored-by: Carlos Villela <cvillela@nvidia.com>
lakamsani pushed a commit to lakamsani/NemoClaw that referenced this pull request Apr 4, 2026
…ross reconnects (NVIDIA#1025)

* fix(sandbox): export proxy env vars with full NO_PROXY and persist across reconnects

OpenShell injects NO_PROXY=127.0.0.1,localhost,::1 into the sandbox, missing
inference.local and the gateway IP (10.200.0.1). This causes LLM inference
requests to route through the egress proxy instead of going direct, and the
proxy gateway IP itself gets proxied.

Add proxy configuration block to nemoclaw-start.sh that:
- Exports HTTP_PROXY, HTTPS_PROXY, and NO_PROXY with inference.local and
  the gateway IP included
- Persists via /etc/profile.d/nemoclaw-proxy.sh (root) or ~/.profile
  (non-root fallback) so values survive OpenShell reconnect injection
- Supports NEMOCLAW_PROXY_HOST / NEMOCLAW_PROXY_PORT overrides

The non-root fallback ensures the fix works in environments like Brev where
containers run without root privileges.

Tested on DGX Spark (ARM64) and Brev VM (x86_64). Verified NO_PROXY contains
inference.local and 10.200.0.1 inside the live sandbox after connect.

Ref: NVIDIA#626, NVIDIA#704
Ref: NVIDIA#704 (comment)

* fix(sandbox): write proxy config to ~/.bashrc for interactive reconnect sessions

OpenShell's `sandbox connect` spawns `/bin/bash -i` (interactive, non-login),
which sources ~/.bashrc — not ~/.profile or /etc/profile.d/*.  The previous
approach wrote to ~/.profile and /etc/profile.d/, neither of which is sourced
by `bash -i`, so the narrow OpenShell-injected NO_PROXY persisted in live
interactive sessions.

Changes:
- Write proxy snippet to ~/.bashrc (primary) and ~/.profile (login fallback)
- Export both uppercase and lowercase proxy variants (NO_PROXY + no_proxy,
  HTTP_PROXY + http_proxy, etc.) — Node.js undici prefers lowercase no_proxy
  over uppercase NO_PROXY when both are set
- Add idempotency guard to prevent duplicate blocks on container restart
- Update tests: verify .bashrc writing, idempotency, bash -i override
  behavior, and lowercase variant correctness

Tested on DGX Spark (ARM64) and Brev VM (x86_64) with full destroy +
re-onboard + live `env | grep proxy` verification inside the sandbox shell
via `openshell sandbox connect`.

Ref: NVIDIA#626

* fix(sandbox): replace stale proxy values on restart with begin/end markers

Use begin/end markers in .bashrc/.profile proxy snippet so
_write_proxy_snippet replaces the block when PROXY_HOST/PORT change
instead of silently keeping stale values. Adds test coverage for the
replacement path.

Addresses CodeRabbit review feedback on idempotency gap.

* fix(sandbox): resolve sandbox user home dynamically when running as root

When the entrypoint runs as root, $HOME is /root — the proxy snippet
was written to /root/.bashrc instead of the sandbox user's home.
Use getent passwd to look up the sandbox user's home when running as
UID 0; fall back to /sandbox if the user entry is missing.

Addresses CodeRabbit review feedback on _SANDBOX_HOME resolution.

---------

Co-authored-by: Carlos Villela <cvillela@nvidia.com>
gemini2026 pushed a commit to gemini2026/NemoClaw that referenced this pull request Apr 14, 2026
…ross reconnects (NVIDIA#1025)

* fix(sandbox): export proxy env vars with full NO_PROXY and persist across reconnects

OpenShell injects NO_PROXY=127.0.0.1,localhost,::1 into the sandbox, missing
inference.local and the gateway IP (10.200.0.1). This causes LLM inference
requests to route through the egress proxy instead of going direct, and the
proxy gateway IP itself gets proxied.

Add proxy configuration block to nemoclaw-start.sh that:
- Exports HTTP_PROXY, HTTPS_PROXY, and NO_PROXY with inference.local and
  the gateway IP included
- Persists via /etc/profile.d/nemoclaw-proxy.sh (root) or ~/.profile
  (non-root fallback) so values survive OpenShell reconnect injection
- Supports NEMOCLAW_PROXY_HOST / NEMOCLAW_PROXY_PORT overrides

The non-root fallback ensures the fix works in environments like Brev where
containers run without root privileges.

Tested on DGX Spark (ARM64) and Brev VM (x86_64). Verified NO_PROXY contains
inference.local and 10.200.0.1 inside the live sandbox after connect.

Ref: NVIDIA#626, NVIDIA#704
Ref: NVIDIA#704 (comment)

* fix(sandbox): write proxy config to ~/.bashrc for interactive reconnect sessions

OpenShell's `sandbox connect` spawns `/bin/bash -i` (interactive, non-login),
which sources ~/.bashrc — not ~/.profile or /etc/profile.d/*.  The previous
approach wrote to ~/.profile and /etc/profile.d/, neither of which is sourced
by `bash -i`, so the narrow OpenShell-injected NO_PROXY persisted in live
interactive sessions.

Changes:
- Write proxy snippet to ~/.bashrc (primary) and ~/.profile (login fallback)
- Export both uppercase and lowercase proxy variants (NO_PROXY + no_proxy,
  HTTP_PROXY + http_proxy, etc.) — Node.js undici prefers lowercase no_proxy
  over uppercase NO_PROXY when both are set
- Add idempotency guard to prevent duplicate blocks on container restart
- Update tests: verify .bashrc writing, idempotency, bash -i override
  behavior, and lowercase variant correctness

Tested on DGX Spark (ARM64) and Brev VM (x86_64) with full destroy +
re-onboard + live `env | grep proxy` verification inside the sandbox shell
via `openshell sandbox connect`.

Ref: NVIDIA#626

* fix(sandbox): replace stale proxy values on restart with begin/end markers

Use begin/end markers in .bashrc/.profile proxy snippet so
_write_proxy_snippet replaces the block when PROXY_HOST/PORT change
instead of silently keeping stale values. Adds test coverage for the
replacement path.

Addresses CodeRabbit review feedback on idempotency gap.

* fix(sandbox): resolve sandbox user home dynamically when running as root

When the entrypoint runs as root, $HOME is /root — the proxy snippet
was written to /root/.bashrc instead of the sandbox user's home.
Use getent passwd to look up the sandbox user's home when running as
UID 0; fall back to /sandbox if the user entry is missing.

Addresses CodeRabbit review feedback on _SANDBOX_HOME resolution.

---------

Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request OpenShell Support for OpenShell, a safe, private runtime for autonomous AI agents

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Web search tools fail in sandbox: DNS resolution blocked (getaddrinfo EAI_AGAIN)

3 participants