Skip to content

fix(sandbox): export proxy env vars with full NO_PROXY and persist across reconnects#1025

Merged
kjw3 merged 5 commits intoNVIDIA:mainfrom
senthilr-nv:fix/sandbox-proxy-no-proxy-persistence
Mar 27, 2026
Merged

fix(sandbox): export proxy env vars with full NO_PROXY and persist across reconnects#1025
kjw3 merged 5 commits intoNVIDIA:mainfrom
senthilr-nv:fix/sandbox-proxy-no-proxy-persistence

Conversation

@senthilr-nv
Copy link
Copy Markdown
Contributor

@senthilr-nv senthilr-nv commented Mar 27, 2026

Summary

OpenShell injects NO_PROXY=127.0.0.1,localhost,::1 into the sandbox, missing inference.local and the gateway IP (10.200.0.1). This causes LLM inference requests to be routed through the egress proxy instead of going direct, and the proxy gateway IP itself gets proxied.

This PR adds a proxy configuration block to nemoclaw-start.sh that:

  • Exports HTTP_PROXY, HTTPS_PROXY, NO_PROXY (and lowercase equivalents) with the full bypass list
  • Persists the config to ~/.bashrc (primary) and ~/.profile (login-shell fallback) so values survive openshell sandbox connect re-injection
  • Uses begin/end markers so the block is replaced (not duplicated) if proxy host/port change between restarts
  • Resolves _SANDBOX_HOME dynamically via getent passwd when running as root, falling back to /sandbox
  • Supports NEMOCLAW_PROXY_HOST / NEMOCLAW_PROXY_PORT overrides for environments where the gateway address differs

Why ~/.bashrc?

openshell sandbox connect spawns bash -i (interactive, non-login) via ssh.rs. The connect path calls env_clear() and then injects the narrow NO_PROXY. Since bash -i sources ~/.bashrc on startup, writing the full proxy config there restores the correct values after OpenShell's injection.

Problem

When a user asks the agent to fetch a URL (e.g. web_fetch tool), the sandbox routes the request through the egress proxy incorrectly because:

  1. OpenShell's NO_PROXY is too narrow — inference.local is missing, so LLM inference traffic is wrongly routed through the proxy
  2. The gateway IP (10.200.0.1) is missing from NO_PROXY, creating a potential proxy loop
  3. Every openshell sandbox connect re-injects the narrow NO_PROXY, overwriting any manual fix
  4. Node.js undici prefers lowercase no_proxy over uppercase — without lowercase exports, the bypass list is ignored

Changes

  • scripts/nemoclaw-start.sh: Added proxy environment block with full NO_PROXY list, ~/.bashrc + ~/.profile persistence with begin/end markers, dynamic sandbox home resolution
  • test/service-env.test.js: Added unit tests for proxy variable extraction, NO_PROXY completeness, lowercase variants, .bashrc/.profile persistence, idempotency, stale value replacement, and bash-i override simulation

Testing

  • DGX Spark (ARM64, Ubuntu 24.04): Fresh destroy → re-onboard → connect → verified full NO_PROXY and no_proxy in live sandbox shell
  • Brev VM (x86_64, Ubuntu 22.04, non-root): Same end-to-end verification; confirmed proxy bypass works
  • 22/22 unit tests green on both platforms

Related

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 27, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Entrypoint startup now derives proxy host/port from NEMOCLAW_PROXY_HOST/NEMOCLAW_PROXY_PORT (with defaults), exports uppercase and lowercase HTTP_PROXY/HTTPS_PROXY/NO_PROXY (including loopback, inference.local, and the proxy gateway IP), and idempotently persists a marked snippet into the sandbox user’s ~/.bashrc and ~/.profile when writable.

Changes

Cohort / File(s) Summary
Proxy configuration script
scripts/nemoclaw-start.sh
Adds logic to derive proxy host/port with defaults, build/export both uppercase and lowercase HTTP_PROXY/HTTPS_PROXY/NO_PROXY (NO_PROXY expanded to include localhost, loopback IPv4/IPv6, inference.local, and the gateway IP), and idempotently append a marked proxy-export snippet into the sandbox user's ~/.bashrc and ~/.profile when the home directory is writable.
Proxy tests
test/service-env.test.js
Adds Vitest suite "proxy environment variables (issue #626)" with an extractProxyVars helper that runs the extracted snippet to capture proxy envs. Tests default proxy URL, host/port overrides, NO_PROXY contents (loopback, inference.local, gateway IP), lowercase equivalents, persistence into simulated ~/.bashrc/~/.profile, idempotency of the persisted block, and sourcing behavior. Imports execFileSync, fs, os, and path utilities for simulations.

Sequence Diagram(s)

sequenceDiagram
    participant Entrypoint as Entrypoint Script
    participant Env as Process Env
    participant FS as User Home FS
    participant Shell as Interactive/Login Shell

    Entrypoint->>Env: read NEMOCLAW_PROXY_HOST/PORT (or defaults)
    Entrypoint->>Env: export HTTP_PROXY/HTTPS_PROXY/NO_PROXY and lowercase vars
    Entrypoint->>FS: if writable, write idempotent snippet to ~/.bashrc and ~/.profile
    Shell->>FS: on interactive/login start, source ~/.bashrc/~/.profile
    FS->>Shell: provide exported proxy envs to shell sessions
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I hop into startup, whiskers keen,
I stitch proxy threads where networks convene,
I tuck exports in bash and profile snug,
So shells remember with a gentle hug,
🥕 sandbox warmed, connectivity seen.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: exporting proxy environment variables with a complete NO_PROXY list and persisting them across reconnects, which aligns directly with the PR's core objective.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@senthilr-nv senthilr-nv requested review from cv and kjw3 March 27, 2026 08:01
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
scripts/nemoclaw-start.sh (1)

231-243: Potential duplicate appends to ~/.profile on container restart.

Line 242 appends the proxy snippet to ~/.profile without checking if it already exists. If the container is stopped and restarted (rather than recreated), the snippet will be appended again, resulting in duplicates.

Consider making the append idempotent by checking for an existing marker:

♻️ Suggested idempotent append
 elif [ -w "${HOME:-/sandbox}" ]; then
-  printf '\n%s\n' "$_PROXY_SNIPPET" >>"${HOME:-/sandbox}/.profile"
+  if ! grep -q 'nemoclaw-start.sh.*proxy' "${HOME:-/sandbox}/.profile" 2>/dev/null; then
+    printf '\n%s\n' "$_PROXY_SNIPPET" >>"${HOME:-/sandbox}/.profile"
+  fi
 fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/nemoclaw-start.sh` around lines 231 - 243, The script currently
unconditionally appends the _PROXY_SNIPPET to "${HOME:-/sandbox}/.profile",
causing duplicate entries on container restart; change the logic to be
idempotent by detecting an existing marker or the snippet before appending: add
a unique marker string (e.g. "# nemoclaw-proxy start") inside _PROXY_SNIPPET
and, before the printf >>"${HOME:-/sandbox}/.profile" in the non-root branch,
check the file for that marker (using grep or test) and only append if the
marker is absent; keep the root branch writing to
/etc/profile.d/nemoclaw-proxy.sh as-is.
test/service-env.test.js (1)

175-208: Profile persistence test validates file writing but diverges from actual script logic.

The wrapper (lines 180-194) is a simplified reimplementation that doesn't exercise the actual script's id -u check or the root vs non-root branching. This tests the concept of writing a proxy snippet when a directory exists, but not the exact production logic.

Consider extracting and sourcing the actual persistence block from the script (similar to extractProxyVars) for higher fidelity, or document that this is an intentional simplification.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/service-env.test.js` around lines 175 - 208, The test "writes proxy
snippet to /etc/profile.d when the directory exists" reimplements the
persistence wrapper and therefore doesn't exercise the script's real branching
(root check via id -u and root vs non-root behavior); update the test to source
or invoke the actual persistence block instead of the ad-hoc wrapper—either
reuse the existing extractProxyVars helper to build the exact snippet or load
the real persistence logic (the same code that writes nemoclaw-proxy.sh and
performs the id -u check) so the test covers root/non-root branches and matches
production behavior (or add a clear comment explaining this is an intentional
simplification if you prefer not to change the test).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 231-243: The script currently unconditionally appends the
_PROXY_SNIPPET to "${HOME:-/sandbox}/.profile", causing duplicate entries on
container restart; change the logic to be idempotent by detecting an existing
marker or the snippet before appending: add a unique marker string (e.g. "#
nemoclaw-proxy start") inside _PROXY_SNIPPET and, before the printf
>>"${HOME:-/sandbox}/.profile" in the non-root branch, check the file for that
marker (using grep or test) and only append if the marker is absent; keep the
root branch writing to /etc/profile.d/nemoclaw-proxy.sh as-is.

In `@test/service-env.test.js`:
- Around line 175-208: The test "writes proxy snippet to /etc/profile.d when the
directory exists" reimplements the persistence wrapper and therefore doesn't
exercise the script's real branching (root check via id -u and root vs non-root
behavior); update the test to source or invoke the actual persistence block
instead of the ad-hoc wrapper—either reuse the existing extractProxyVars helper
to build the exact snippet or load the real persistence logic (the same code
that writes nemoclaw-proxy.sh and performs the id -u check) so the test covers
root/non-root branches and matches production behavior (or add a clear comment
explaining this is an intentional simplification if you prefer not to change the
test).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 61decdb9-e740-41db-9331-48f07277bead

📥 Commits

Reviewing files that changed from the base of the PR and between 5c269c1 and cd9680d.

📒 Files selected for processing (2)
  • scripts/nemoclaw-start.sh
  • test/service-env.test.js

@senthilr-nv
Copy link
Copy Markdown
Contributor Author

This PR builds on the approach from PR #704 by @nanookclaw, which correctly identified the need to export proxy env vars in nemoclaw-start.sh and persist them via /etc/profile.d/. Our contribution fixes the runtime NO_PROXY mismatch (missing inference.local and gateway IP) and adds a non-root fallback for environments like Brev.

…ross reconnects

OpenShell injects NO_PROXY=127.0.0.1,localhost,::1 into the sandbox, missing
inference.local and the gateway IP (10.200.0.1). This causes LLM inference
requests to route through the egress proxy instead of going direct, and the
proxy gateway IP itself gets proxied.

Add proxy configuration block to nemoclaw-start.sh that:
- Exports HTTP_PROXY, HTTPS_PROXY, and NO_PROXY with inference.local and
  the gateway IP included
- Persists via /etc/profile.d/nemoclaw-proxy.sh (root) or ~/.profile
  (non-root fallback) so values survive OpenShell reconnect injection
- Supports NEMOCLAW_PROXY_HOST / NEMOCLAW_PROXY_PORT overrides

The non-root fallback ensures the fix works in environments like Brev where
containers run without root privileges.

Tested on DGX Spark (ARM64) and Brev VM (x86_64). Verified NO_PROXY contains
inference.local and 10.200.0.1 inside the live sandbox after connect.

Ref: NVIDIA#626, NVIDIA#704
Ref: NVIDIA#704 (comment)
@senthilr-nv senthilr-nv force-pushed the fix/sandbox-proxy-no-proxy-persistence branch from 6baa176 to e7fa924 Compare March 27, 2026 17:59
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
test/service-env.test.js (1)

175-208: Add coverage for non-root ~/.profile fallback and repeat-run behavior.

This test validates the /etc/profile.d path, but the PR also relies on the non-root fallback. Add one test for writing to ~/.profile and one for repeated invocation behavior to guard regressions there.

🧪 Suggested additional test shape
+    it("writes proxy snippet to ~/.profile in non-root fallback", () => {
+      // Arrange temp HOME and run wrapper with /etc/profile.d branch disabled
+      // Assert ~/.profile contains expected exports and inference.local entry
+    });
+
+    it("does not duplicate proxy snippet on repeated startup", () => {
+      // Run wrapper twice against same profile target
+      // Assert marker/export block count is 1
+    });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/service-env.test.js` around lines 175 - 208, Add two tests to
test/service-env.test.js: one that simulates a non-root user by setting HOME to
a temp directory and verifies the code writes the proxy snippet into ~/.profile
(use readFileSync on join(process.env.HOME, ".profile") and assert presence of
export HTTP_PROXY/HTTPS_PROXY/NO_PROXY and the host IP), and a second that calls
the entrypoint twice (or runs the same wrapper twice like the existing test does
with execFileSync on a temp script) to assert idempotency/no duplicate snippets
in either /etc/profile.d/nemoclaw-proxy.sh or ~/.profile (check counts or
absence of repeated blocks); locate the test scaffolding around the existing
"writes proxy snippet to /etc/profile.d when the directory exists" spec and
reuse its tmpdir/tmpFile setup and cleanup logic (profileDir, tmpFile, wrapper,
execFileSync) to implement these two new specs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 239-243: The snippet writes _PROXY_SNIPPET to the non-root user's
profile using ${HOME:-/sandbox} before HOME is reset and uses >> which can
duplicate the snippet; change the logic to determine the actual non-root home
after the script resets ownership (or compute a target_home variable after
privilege drop) and then only persist the snippet idempotently by checking for
its presence (e.g. grep -Fq "$_PROXY_SNIPPET" "$target_home/.profile") and
writing it only if missing (or replace existing block), and continue to write
the root case to the profile.d file (use the same presence check for
/etc/profile.d target) so _PROXY_SNIPPET is not appended multiple times and the
path is resolved deterministically; reference the _PROXY_SNIPPET variable, the
.profile target, and the profile.d target in your changes.

In `@test/service-env.test.js`:
- Around line 161-173: The test titles are misleading: the it() descriptions say
"excludes" while the assertions check for inclusion; update the two it()
descriptions in the test file so they reflect inclusion/containment (e.g.,
change "NO_PROXY excludes loopback and inference.local" to "NO_PROXY includes
loopback and inference.local" and change "NO_PROXY excludes OpenShell gateway
IP" to "NO_PROXY includes OpenShell gateway IP") — keep the assertions
referencing extractProxyVars() and vars.NO_PROXY as-is.

---

Nitpick comments:
In `@test/service-env.test.js`:
- Around line 175-208: Add two tests to test/service-env.test.js: one that
simulates a non-root user by setting HOME to a temp directory and verifies the
code writes the proxy snippet into ~/.profile (use readFileSync on
join(process.env.HOME, ".profile") and assert presence of export
HTTP_PROXY/HTTPS_PROXY/NO_PROXY and the host IP), and a second that calls the
entrypoint twice (or runs the same wrapper twice like the existing test does
with execFileSync on a temp script) to assert idempotency/no duplicate snippets
in either /etc/profile.d/nemoclaw-proxy.sh or ~/.profile (check counts or
absence of repeated blocks); locate the test scaffolding around the existing
"writes proxy snippet to /etc/profile.d when the directory exists" spec and
reuse its tmpdir/tmpFile setup and cleanup logic (profileDir, tmpFile, wrapper,
execFileSync) to implement these two new specs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a24817d1-2fc3-4417-820c-e14985633fcb

📥 Commits

Reviewing files that changed from the base of the PR and between cd9680d and e7fa924.

📒 Files selected for processing (2)
  • scripts/nemoclaw-start.sh
  • test/service-env.test.js

@kjw3
Copy link
Copy Markdown
Contributor

kjw3 commented Mar 27, 2026

Thanks for working on this. I reran the validation entirely on a single Linux GPU host, and I don’t think this is ready to merge yet.

What I tested on that one host:

  • checked out the latest PR tip
  • ran test/service-env.test.js
    • passed
  • ran a fresh non-interactive onboard from this PR branch with a new sandbox
  • confirmed the new sandbox came up Ready
  • connected into that fresh sandbox through a real interactive OpenShell pseudo-TTY session
  • inspected the live proxy env after connect
  • inspected the persisted proxy file(s) inside that same fresh sandbox

What I found:

  • the PR code is making it into the built sandbox
  • in the fresh sandbox, ~/.profile exists and contains the full proxy snippet from this PR, including:
    • inference.local
    • 10.200.0.1
  • but after a real interactive reconnect, the live env still comes back as:
    • HTTP_PROXY=http://10.200.0.1:3128
    • HTTPS_PROXY=http://10.200.0.1:3128
    • NO_PROXY=127.0.0.1,localhost,::1

So the main blocker is still the same:

  • the profile-based persistence strategy does not actually win on the real reconnect path
  • the actual interactive session still ends up with the narrow OpenShell-injected NO_PROXY

Because of that, I don’t think this closes the runtime mismatch yet. The next step is to fix the env injection / session startup path that OpenShell uses on connect, rather than relying on /etc/profile.d or ~/.profile to restore the value afterward.

@senthilr-nv senthilr-nv force-pushed the fix/sandbox-proxy-no-proxy-persistence branch from 3bb83cf to 6ebeea6 Compare March 27, 2026 19:33
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
test/service-env.test.js (1)

164-176: ⚠️ Potential issue | 🟡 Minor

Test names contradict the assertions.

Lines 164 and 173 describe "excludes" but the assertions verify inclusion with toContain. Rename for clarity:

-    it("NO_PROXY excludes loopback and inference.local", () => {
+    it("NO_PROXY includes loopback and inference.local", () => {
-    it("NO_PROXY excludes OpenShell gateway IP", () => {
+    it("NO_PROXY includes OpenShell gateway IP", () => {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/service-env.test.js` around lines 164 - 176, The two test descriptions
are misleading: the specs in the test block for the NO_PROXY checks say
"excludes" but the assertions on extractProxyVars().NO_PROXY use toContain, so
either the assertions or the test names must be changed; update the test titles
(the it(...) strings) to say "includes" (e.g., change "NO_PROXY excludes
loopback and inference.local" and "NO_PROXY excludes OpenShell gateway IP" to
"NO_PROXY includes ...") so they match the assertions that check inclusion,
referencing the test cases that call extractProxyVars() and assert on
vars.NO_PROXY.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@test/service-env.test.js`:
- Around line 164-176: The two test descriptions are misleading: the specs in
the test block for the NO_PROXY checks say "excludes" but the assertions on
extractProxyVars().NO_PROXY use toContain, so either the assertions or the test
names must be changed; update the test titles (the it(...) strings) to say
"includes" (e.g., change "NO_PROXY excludes loopback and inference.local" and
"NO_PROXY excludes OpenShell gateway IP" to "NO_PROXY includes ...") so they
match the assertions that check inclusion, referencing the test cases that call
extractProxyVars() and assert on vars.NO_PROXY.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 05a59140-8d37-469c-aaac-34e0e85b01a1

📥 Commits

Reviewing files that changed from the base of the PR and between e7fa924 and 6ebeea6.

📒 Files selected for processing (2)
  • scripts/nemoclaw-start.sh
  • test/service-env.test.js

@senthilr-nv senthilr-nv force-pushed the fix/sandbox-proxy-no-proxy-persistence branch from 6ebeea6 to 766294f Compare March 27, 2026 19:41
…ct sessions

OpenShell's `sandbox connect` spawns `/bin/bash -i` (interactive, non-login),
which sources ~/.bashrc — not ~/.profile or /etc/profile.d/*.  The previous
approach wrote to ~/.profile and /etc/profile.d/, neither of which is sourced
by `bash -i`, so the narrow OpenShell-injected NO_PROXY persisted in live
interactive sessions.

Changes:
- Write proxy snippet to ~/.bashrc (primary) and ~/.profile (login fallback)
- Export both uppercase and lowercase proxy variants (NO_PROXY + no_proxy,
  HTTP_PROXY + http_proxy, etc.) — Node.js undici prefers lowercase no_proxy
  over uppercase NO_PROXY when both are set
- Add idempotency guard to prevent duplicate blocks on container restart
- Update tests: verify .bashrc writing, idempotency, bash -i override
  behavior, and lowercase variant correctness

Tested on DGX Spark (ARM64) and Brev VM (x86_64) with full destroy +
re-onboard + live `env | grep proxy` verification inside the sandbox shell
via `openshell sandbox connect`.

Ref: NVIDIA#626
@senthilr-nv senthilr-nv force-pushed the fix/sandbox-proxy-no-proxy-persistence branch from 766294f to ac2224f Compare March 27, 2026 19:44
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 244-260: The current _write_proxy_snippet only skips writing when
_PROXY_MARKER already exists, leaving stale values in files; change the approach
to make the persisted block replaceable: update _PROXY_SNIPPET to include a
distinct end marker (e.g., _PROXY_MARKER_END), then modify _write_proxy_snippet
to detect the start marker and, if present, remove the existing block between
the start and end markers (or the start marker through the next blank line) and
then write the new _PROXY_SNIPPET; use the existing symbols _PROXY_MARKER,
_PROXY_SNIPPET and function _write_proxy_snippet to locate and perform the
replacement so subsequent runs refresh proxy values instead of deduping.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9123b1a1-93de-4fa3-ab63-215e61de8366

📥 Commits

Reviewing files that changed from the base of the PR and between 6ebeea6 and 766294f.

📒 Files selected for processing (2)
  • scripts/nemoclaw-start.sh
  • test/service-env.test.js

…rkers

Use begin/end markers in .bashrc/.profile proxy snippet so
_write_proxy_snippet replaces the block when PROXY_HOST/PORT change
instead of silently keeping stale values. Adds test coverage for the
replacement path.

Addresses CodeRabbit review feedback on idempotency gap.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 253-266: The script sets _SANDBOX_HOME="${HOME:-/sandbox}" which
picks /root when running as root and causes _write_proxy_snippet to append to
root's dotfiles; change the logic that defines _SANDBOX_HOME so that when
running as UID 0 (or when HOME equals /root) it uses /sandbox instead of $HOME,
then continue calling _write_proxy_snippet for "${_SANDBOX_HOME}/.bashrc" and
"${_SANDBOX_HOME}/.profile"; update the assignment of _SANDBOX_HOME (and any
related conditional) so _write_proxy_snippet, _PROXY_MARKER and _PROXY_SNIPPET
operate on the actual sandbox user's home rather than /root.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bb628d55-ca34-4787-8f2f-92cbfbd93cda

📥 Commits

Reviewing files that changed from the base of the PR and between 766294f and f74f078.

📒 Files selected for processing (2)
  • scripts/nemoclaw-start.sh
  • test/service-env.test.js
✅ Files skipped from review due to trivial changes (1)
  • test/service-env.test.js

When the entrypoint runs as root, $HOME is /root — the proxy snippet
was written to /root/.bashrc instead of the sandbox user's home.
Use getent passwd to look up the sandbox user's home when running as
UID 0; fall back to /sandbox if the user entry is missing.

Addresses CodeRabbit review feedback on _SANDBOX_HOME resolution.
@senthilr-nv
Copy link
Copy Markdown
Contributor Author

Thanks for the detailed repro — you were spot on. The original ~/.profile approach doesn't work for the real reconnect path because openshell sandbox connect lands in an interactive non-login bash, so the relevant user restore point is ~/.bashrc, not ~/.profile.

I traced through the OpenShell connect path to confirm the mechanism: child_env.rs hardcodes NO_PROXY=127.0.0.1,localhost,::1, and ssh.rs calls env_clear() before setting those narrow values and spawning bash -i. So every reconnect wipes the inherited env and re-injects the narrow list, but ~/.bashrc still gives NemoClaw a restore point after shell startup.

The updated fix writes the full proxy config to ~/.bashrc (primary) and ~/.profile (login-shell fallback). Both uppercase and lowercase variants are exported since Node.js undici prefers lowercase no_proxy when both are present.

I also addressed the CodeRabbit feedback in the latest commits:

  • begin/end markers so the persisted block is replaced rather than duplicated if proxy host/port change between restarts
  • dynamic _SANDBOX_HOME resolution via getent passwd when running as root, so the snippet lands in the sandbox user's home instead of /root

Verified end-to-end on DGX Spark (ARM64) and Brev VM (x86_64) with fresh destroy -> re-onboard -> connect:

env | grep -i proxy | sort
ALL_PROXY=http://10.200.0.1:3128
HTTPS_PROXY=http://10.200.0.1:3128
HTTP_PROXY=http://10.200.0.1:3128
NODE_USE_ENV_PROXY=1
NO_PROXY=localhost,127.0.0.1,::1,inference.local,10.200.0.1
grpc_proxy=http://10.200.0.1:3128
http_proxy=http://10.200.0.1:3128
https_proxy=http://10.200.0.1:3128
no_proxy=localhost,127.0.0.1,::1,inference.local,10.200.0.1

Proxy bypass confirmed — the remaining getaddrinfo EAI_AGAIN inference.local is the DNS resolution issue tracked in #732, separate from proxy routing. 22/22 unit tests green on both platforms.

@kjw3 kjw3 self-assigned this Mar 27, 2026
@kjw3 kjw3 merged commit c051cbb into NVIDIA:main Mar 27, 2026
9 checks passed
@kjw3
Copy link
Copy Markdown
Contributor

kjw3 commented Mar 27, 2026

Reran this on the latest branch tip, and the result changed in the right way.

What I validated on Spark:

  • refreshed to the latest PR tip
  • ran test/service-env.test.js on the branch
    • 22/22 passed
  • deleted the old disposable test sandbox
  • ran a fresh non-interactive onboard from this PR branch
  • connected into the fresh sandbox through a real interactive OpenShell pseudo-TTY session
  • inspected the live env after connect
  • inspected ~/.bashrc and ~/.profile inside that same fresh sandbox

What I saw in the fresh sandbox after connect:

  • HTTP_PROXY=http://10.200.0.1:3128
  • HTTPS_PROXY=http://10.200.0.1:3128
  • NO_PROXY=localhost,127.0.0.1,::1,inference.local,10.200.0.1
  • lowercase variants were also present and correct

And inside the sandbox:

  • both ~/.bashrc and ~/.profile contained the expected marked proxy block
  • this was a fresh sandbox created from the updated branch, not stale-state reuse

So with the ~/.bashrc-based restore path, this now fixes the real interactive reconnect behavior that the earlier ~/.profile-only approach did not.

Also, credit where it’s due: PR #704 by @nanookclaw was the work that correctly identified the core direction here in the first place:

  • export the proxy env in nemoclaw-start.sh
  • include the full bypass list in NO_PROXY
  • fix the reconnect/runtime mismatch rather than treating it as a one-off manual env problem

This branch looks like the follow-through that closes that line of work.

senthilr-nv added a commit to senthilr-nv/NemoClaw that referenced this pull request Mar 28, 2026
inference.local is a virtual hostname that only the OpenShell proxy
understands via CONNECT tunnel interception.  Including it in NO_PROXY
caused Node.js to bypass the proxy and connect directly, but since
inference.local has no DNS/hosts entry, the connection hangs and the
gateway surfaces "LLM request timed out" errors.

Remove inference.local from NO_PROXY so requests route through the
proxy at 10.200.0.1:3128, which intercepts the CONNECT and forwards
to the configured inference provider.

Fixes inference timeout regression introduced in NVIDIA#1025.
TSavo pushed a commit to wopr-network/nemoclaw that referenced this pull request Mar 28, 2026
* fix: improve gateway lifecycle recovery (NVIDIA#953)

* fix: improve gateway lifecycle recovery

* docs: fix readme markdown list spacing

* fix: tighten gateway lifecycle review follow-ups

* fix: simplify tokenized control ui output

* fix: restore chat route in control ui urls

* refactor: simplify ansi stripping in onboard

* fix: shorten control ui url output

* fix: move control ui below cli next steps

* fix: swap hard/soft ulimit settings in start script (NVIDIA#951)

Fixes NVIDIA#949

Co-authored-by: KJ <kejones@nvidia.com>

* chore: add cyclomatic complexity lint rule (NVIDIA#875)

* chore: add cyclomatic complexity rule (ratchet from 95)

Add ESLint complexity rule to bin/ and scripts/ to prevent new
functions from accumulating excessive branching. Starting threshold
is 95 (current worst offender: setupNim in onboard.js). Ratchet
plan: 95 → 40 → 25 → 15.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: ratchet complexity to 20, suppress existing violations

Suppress 6 functions that exceed the threshold with eslint-disable
comments so we can start enforcing at 20 instead of 95:

- setupNim (95), setupPolicies (41), setupInference (22) in onboard.js
- deploy (22), main IIFE (27) in nemoclaw.js
- applyPreset (24) in policies.js

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: suppress complexity for 3 missed functions

preflight (23), getReconciledSandboxGatewayState (25), sandboxStatus (27)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add host-side config and state file locations to README (NVIDIA#903)

Signed-off-by: peteryuqin <peter.yuqin@gmail.com>

* chore: add tsconfig.cli.json, root execa, TS coverage ratchet (NVIDIA#913)

* chore: add tsconfig.cli.json, root execa, TS coverage ratchet

Foundation for the CLI TypeScript migration (PR 0 of the shell
consolidation plan). No runtime changes — config, tooling, and
dependency only.

- tsconfig.cli.json: strict TS type-checking for bin/ and scripts/
  (noEmit, module: preserve — tsx handles the runtime)
- scripts/check-coverage-ratchet.ts: pure TS replacement for the
  bash+python coverage ratchet script (same logic, same tolerance)
- execa ^9.6.1 added to root devDependencies (used by PR 1+)
- pr.yaml: coverage ratchet step now runs the TS version via tsx
- .pre-commit-config.yaml: SPDX headers cover scripts/*.ts,
  new tsc-check-cli pre-push hook
- CONTRIBUTING.md: document typecheck:cli task and CLI pre-push hook
- Delete scripts/check-coverage-ratchet.sh

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Apply suggestion from @brandonpelfrey

* chore: address PR feedback — use types_or, add tsx devDep

- Use `types_or: [ts, tsx]` instead of file glob for tsc-check-cli
  hook per @brandonpelfrey's suggestion.
- Add `tsx` to devDependencies so CI doesn't re-fetch it on every run
  per CodeRabbit's suggestion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): ignore GitHub "Apply suggestion" commits in commitlint

* fix(ci): lint only PR title since repo is squash-merge only

Reverts the commitlint ignores rule from the previous commit and
instead removes the per-commit lint step entirely.

Individual commit messages are discarded at merge time — only the
squash-merged PR title lands in main and drives changelog generation.
Drop the per-commit lint, keep the PR title check, and remove the
now-unnecessary fetch-depth: 0.

* Revert "fix(ci): lint only PR title since repo is squash-merge only"

This reverts commit 1257a47.

* Revert "fix(ci): ignore GitHub "Apply suggestion" commits in commitlint"

This reverts commit c395657.

* docs: fix markdownlint MD032 in README (blank line before list)

* refactor: make coverage ratchet script idiomatic TypeScript

- Wrap in main() with process.exitCode instead of scattered process.exit()
- Replace mutable flags with .map()/.some() over typed MetricResult[]
- Separate pure logic (checkMetrics) from formatting (formatReport)
- Throw with { cause } chaining instead of exit-in-helpers
- Derive CoverageThresholds from METRICS tuple (single source of truth)
- Exhaustive switch on CheckStatus discriminated union

* refactor: remove duplication in coverage ratchet script

- Drop STATUS_LABELS map; inline labels in exhaustive switch
- Extract common 'metric coverage is N%' preamble in formatResult
- Simplify ratchetedThresholds: use results directly (already in
  METRICS order) instead of re-scanning with .find() per metric
- Compute 'failed' once in main, pass into formatReport to avoid
  duplicate .some() scan

* refactor: simplify coverage ratchet with FP patterns

- Extract classify() as a named pure function (replaces nested ternary)
- loadJSON takes repo-relative paths, eliminating THRESHOLD_PATH and
  SUMMARY_PATH constants (DRY the join-with-REPO_ROOT pattern)
- Drop CoverageMetric/CoverageSummary interfaces (only pct is read);
  use structural type at the call site instead
- Inline ratchetedThresholds (one-liner, used once)
- formatReport derives fail/improved from results instead of taking
  a pre-computed boolean (let functions derive from data, don't
  thread derived state)
- sections.join("\n\n") replaces manual empty-string pushing
- Shorter type names (Thresholds, Status, Result) — no ambiguity
  in a single-purpose script

* refactor: strip coverage ratchet to failure-only output

prek hides output from commands that exit 0, so ok/improved
reporting was dead code. Remove Status, Result, classify,
formatResult, formatReport, and the ratcheted-thresholds
suggestion block. The script now just filters for regressions
and prints actionable errors on failure.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Brandon Pelfrey <bpelfrey@nvidia.com>

* fix: use CONNECT tunnel for WebSocket endpoints in Discord/Slack presets (NVIDIA#438)

* fix: use CONNECT tunnel for WebSocket endpoints in Discord/Slack presets

The egress proxy's HTTP idle timeout (~2 min) kills long-lived WebSocket
connections when endpoints are configured with protocol:rest + tls:terminate.
Switch WebSocket endpoints to access:full (CONNECT tunnel) which bypasses
HTTP-level timeouts entirely.

Discord:
- gateway.discord.gg → access:full (WebSocket gateway)
- Add PUT/PATCH/DELETE methods for discord.com (message editing, reactions)
- Add media.discordapp.net for attachment access

Slack:
- Add wss-primary.slack.com and wss-backup.slack.com → access:full
  (Socket Mode WebSocket endpoints)

Partially addresses NVIDIA#409 — the policy-level fix enables WebSocket
connections to survive. The hardcoded 2-min timeout in openshell-sandbox
still affects any protocol:rest endpoints with long-lived connections.

Related: NVIDIA#361 (WhatsApp Web, same root cause)

* fix: correct comment wording for media endpoint and YAML formatting

* fix: standardize Node.js minimum version to 22.16 (NVIDIA#840)

* fix: remove unused RECOMMENDED_NODE_MAJOR from scripts/install.sh

Shellcheck flagged it as unused after the min/recommended merge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: enforce full semver >=22.16.0 in installer scripts

The runtime checks only compared the major Node.js version, allowing
22.0–22.15 to pass despite package.json requiring >=22.16.0. Use the
version_gte() helper for full semver comparison in both installers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: harden version_gte and align fallback message

Guard version_gte() against prerelease suffixes (e.g. "22.16.0-rc.1")
that would crash bash arithmetic. Also update the manual-install
fallback message to reference MIN_NODE_VERSION instead of hardcoded "22".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update test stubs for Node.js 22.16 minimum and add Node 20 rejection test

- Bump node stub in 'succeeds with acceptable Node.js' from v20.0.0 to v22.16.0
- Bump node stub in buildCurlPipeEnv from v22.14.0 to v22.16.0
- Add new test asserting Node.js 20 is rejected by ensure_supported_runtime

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: harden installer and onboard resiliency (NVIDIA#961)

* fix: harden installer and onboard resiliency

* fix: address installer and debug review follow-ups

* fix: harden onboard resume across later setup steps

* test: simplify payload extraction in onboard tests

* test: keep onboard payload extraction target-compatible

* chore: align onboard session lint with complexity rule

* fix: harden onboard session safety and lock handling

* fix: tighten onboard session redaction and metadata handling

* fix(security): strip credentials from migration snapshots and enforce blueprint digest (NVIDIA#769)

Reconciles NVIDIA#156 and NVIDIA#743 into a single comprehensive solution:

- Filter auth-profiles.json at copy time via cpSync filter (from NVIDIA#743)
- Recursive stripCredentials() with pattern-based field detection for
  deep config sanitization (from NVIDIA#156: CREDENTIAL_FIELDS set +
  CREDENTIAL_FIELD_PATTERN regex)
- Remove gateway config section (contains auth tokens) from sandbox
  openclaw.json
- Blueprint digest verification (SHA-256): recorded at snapshot time,
  validated on restore, empty/missing digest is a hard failure
- computeFileDigest() throws when blueprint file is missing instead of
  silently returning null
- Sanitize both snapshot-level and sandbox-bundle openclaw.json copies
- Backward compatible: old snapshots without blueprintDigest skip
  validation
- Bump SNAPSHOT_VERSION 2 → 3

Supersedes NVIDIA#156 and NVIDIA#743.

* fix(sandbox): export proxy env vars with full NO_PROXY and persist across reconnects (NVIDIA#1025)

* fix(sandbox): export proxy env vars with full NO_PROXY and persist across reconnects

OpenShell injects NO_PROXY=127.0.0.1,localhost,::1 into the sandbox, missing
inference.local and the gateway IP (10.200.0.1). This causes LLM inference
requests to route through the egress proxy instead of going direct, and the
proxy gateway IP itself gets proxied.

Add proxy configuration block to nemoclaw-start.sh that:
- Exports HTTP_PROXY, HTTPS_PROXY, and NO_PROXY with inference.local and
  the gateway IP included
- Persists via /etc/profile.d/nemoclaw-proxy.sh (root) or ~/.profile
  (non-root fallback) so values survive OpenShell reconnect injection
- Supports NEMOCLAW_PROXY_HOST / NEMOCLAW_PROXY_PORT overrides

The non-root fallback ensures the fix works in environments like Brev where
containers run without root privileges.

Tested on DGX Spark (ARM64) and Brev VM (x86_64). Verified NO_PROXY contains
inference.local and 10.200.0.1 inside the live sandbox after connect.

Ref: NVIDIA#626, NVIDIA#704
Ref: NVIDIA#704 (comment)

* fix(sandbox): write proxy config to ~/.bashrc for interactive reconnect sessions

OpenShell's `sandbox connect` spawns `/bin/bash -i` (interactive, non-login),
which sources ~/.bashrc — not ~/.profile or /etc/profile.d/*.  The previous
approach wrote to ~/.profile and /etc/profile.d/, neither of which is sourced
by `bash -i`, so the narrow OpenShell-injected NO_PROXY persisted in live
interactive sessions.

Changes:
- Write proxy snippet to ~/.bashrc (primary) and ~/.profile (login fallback)
- Export both uppercase and lowercase proxy variants (NO_PROXY + no_proxy,
  HTTP_PROXY + http_proxy, etc.) — Node.js undici prefers lowercase no_proxy
  over uppercase NO_PROXY when both are set
- Add idempotency guard to prevent duplicate blocks on container restart
- Update tests: verify .bashrc writing, idempotency, bash -i override
  behavior, and lowercase variant correctness

Tested on DGX Spark (ARM64) and Brev VM (x86_64) with full destroy +
re-onboard + live `env | grep proxy` verification inside the sandbox shell
via `openshell sandbox connect`.

Ref: NVIDIA#626

* fix(sandbox): replace stale proxy values on restart with begin/end markers

Use begin/end markers in .bashrc/.profile proxy snippet so
_write_proxy_snippet replaces the block when PROXY_HOST/PORT change
instead of silently keeping stale values. Adds test coverage for the
replacement path.

Addresses CodeRabbit review feedback on idempotency gap.

* fix(sandbox): resolve sandbox user home dynamically when running as root

When the entrypoint runs as root, $HOME is /root — the proxy snippet
was written to /root/.bashrc instead of the sandbox user's home.
Use getent passwd to look up the sandbox user's home when running as
UID 0; fall back to /sandbox if the user entry is missing.

Addresses CodeRabbit review feedback on _SANDBOX_HOME resolution.

---------

Co-authored-by: Carlos Villela <cvillela@nvidia.com>

* fix(policies): preset application for versionless policies (Fixes NVIDIA#35) (NVIDIA#101)

* fix(policies): allow preset application for versionless policies (Fixes NVIDIA#35)

Fixes NVIDIA#35

Signed-off-by: Deepak Jain <deepujain@gmail.com>

* fix: remove stale complexity suppression in policies

---------

Signed-off-by: Deepak Jain <deepujain@gmail.com>
Co-authored-by: Kevin Jones <kejones@nvidia.com>

* fix: restore routed inference and connect UX (NVIDIA#1037)

* fix: restore routed inference and connect UX

* fix: simplify detected local inference hint

* fix: remove stale local inference hint

* test: relax connect forward assertion

---------

Signed-off-by: peteryuqin <peter.yuqin@gmail.com>
Signed-off-by: Deepak Jain <deepujain@gmail.com>
Co-authored-by: KJ <kejones@nvidia.com>
Co-authored-by: Emily Wilkins <80470879+epwilkins@users.noreply.github.com>
Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Peter <peter.yuqin@gmail.com>
Co-authored-by: Brandon Pelfrey <bpelfrey@nvidia.com>
Co-authored-by: Benedikt Schackenberg <6381261+BenediktSchackenberg@users.noreply.github.com>
Co-authored-by: Lucas Wang <lucas_wang@lucas-futures.com>
Co-authored-by: senthilr-nv <senthilr@nvidia.com>
Co-authored-by: Deepak Jain <deepujain@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
ksapru pushed a commit to ksapru/NemoClaw that referenced this pull request Mar 30, 2026
…ross reconnects (NVIDIA#1025)

* fix(sandbox): export proxy env vars with full NO_PROXY and persist across reconnects

OpenShell injects NO_PROXY=127.0.0.1,localhost,::1 into the sandbox, missing
inference.local and the gateway IP (10.200.0.1). This causes LLM inference
requests to route through the egress proxy instead of going direct, and the
proxy gateway IP itself gets proxied.

Add proxy configuration block to nemoclaw-start.sh that:
- Exports HTTP_PROXY, HTTPS_PROXY, and NO_PROXY with inference.local and
  the gateway IP included
- Persists via /etc/profile.d/nemoclaw-proxy.sh (root) or ~/.profile
  (non-root fallback) so values survive OpenShell reconnect injection
- Supports NEMOCLAW_PROXY_HOST / NEMOCLAW_PROXY_PORT overrides

The non-root fallback ensures the fix works in environments like Brev where
containers run without root privileges.

Tested on DGX Spark (ARM64) and Brev VM (x86_64). Verified NO_PROXY contains
inference.local and 10.200.0.1 inside the live sandbox after connect.

Ref: NVIDIA#626, NVIDIA#704
Ref: NVIDIA#704 (comment)

* fix(sandbox): write proxy config to ~/.bashrc for interactive reconnect sessions

OpenShell's `sandbox connect` spawns `/bin/bash -i` (interactive, non-login),
which sources ~/.bashrc — not ~/.profile or /etc/profile.d/*.  The previous
approach wrote to ~/.profile and /etc/profile.d/, neither of which is sourced
by `bash -i`, so the narrow OpenShell-injected NO_PROXY persisted in live
interactive sessions.

Changes:
- Write proxy snippet to ~/.bashrc (primary) and ~/.profile (login fallback)
- Export both uppercase and lowercase proxy variants (NO_PROXY + no_proxy,
  HTTP_PROXY + http_proxy, etc.) — Node.js undici prefers lowercase no_proxy
  over uppercase NO_PROXY when both are set
- Add idempotency guard to prevent duplicate blocks on container restart
- Update tests: verify .bashrc writing, idempotency, bash -i override
  behavior, and lowercase variant correctness

Tested on DGX Spark (ARM64) and Brev VM (x86_64) with full destroy +
re-onboard + live `env | grep proxy` verification inside the sandbox shell
via `openshell sandbox connect`.

Ref: NVIDIA#626

* fix(sandbox): replace stale proxy values on restart with begin/end markers

Use begin/end markers in .bashrc/.profile proxy snippet so
_write_proxy_snippet replaces the block when PROXY_HOST/PORT change
instead of silently keeping stale values. Adds test coverage for the
replacement path.

Addresses CodeRabbit review feedback on idempotency gap.

* fix(sandbox): resolve sandbox user home dynamically when running as root

When the entrypoint runs as root, $HOME is /root — the proxy snippet
was written to /root/.bashrc instead of the sandbox user's home.
Use getent passwd to look up the sandbox user's home when running as
UID 0; fall back to /sandbox if the user entry is missing.

Addresses CodeRabbit review feedback on _SANDBOX_HOME resolution.

---------

Co-authored-by: Carlos Villela <cvillela@nvidia.com>
laitingsheng pushed a commit that referenced this pull request Apr 2, 2026
…ross reconnects (#1025)

* fix(sandbox): export proxy env vars with full NO_PROXY and persist across reconnects

OpenShell injects NO_PROXY=127.0.0.1,localhost,::1 into the sandbox, missing
inference.local and the gateway IP (10.200.0.1). This causes LLM inference
requests to route through the egress proxy instead of going direct, and the
proxy gateway IP itself gets proxied.

Add proxy configuration block to nemoclaw-start.sh that:
- Exports HTTP_PROXY, HTTPS_PROXY, and NO_PROXY with inference.local and
  the gateway IP included
- Persists via /etc/profile.d/nemoclaw-proxy.sh (root) or ~/.profile
  (non-root fallback) so values survive OpenShell reconnect injection
- Supports NEMOCLAW_PROXY_HOST / NEMOCLAW_PROXY_PORT overrides

The non-root fallback ensures the fix works in environments like Brev where
containers run without root privileges.

Tested on DGX Spark (ARM64) and Brev VM (x86_64). Verified NO_PROXY contains
inference.local and 10.200.0.1 inside the live sandbox after connect.

Ref: #626, #704
Ref: #704 (comment)

* fix(sandbox): write proxy config to ~/.bashrc for interactive reconnect sessions

OpenShell's `sandbox connect` spawns `/bin/bash -i` (interactive, non-login),
which sources ~/.bashrc — not ~/.profile or /etc/profile.d/*.  The previous
approach wrote to ~/.profile and /etc/profile.d/, neither of which is sourced
by `bash -i`, so the narrow OpenShell-injected NO_PROXY persisted in live
interactive sessions.

Changes:
- Write proxy snippet to ~/.bashrc (primary) and ~/.profile (login fallback)
- Export both uppercase and lowercase proxy variants (NO_PROXY + no_proxy,
  HTTP_PROXY + http_proxy, etc.) — Node.js undici prefers lowercase no_proxy
  over uppercase NO_PROXY when both are set
- Add idempotency guard to prevent duplicate blocks on container restart
- Update tests: verify .bashrc writing, idempotency, bash -i override
  behavior, and lowercase variant correctness

Tested on DGX Spark (ARM64) and Brev VM (x86_64) with full destroy +
re-onboard + live `env | grep proxy` verification inside the sandbox shell
via `openshell sandbox connect`.

Ref: #626

* fix(sandbox): replace stale proxy values on restart with begin/end markers

Use begin/end markers in .bashrc/.profile proxy snippet so
_write_proxy_snippet replaces the block when PROXY_HOST/PORT change
instead of silently keeping stale values. Adds test coverage for the
replacement path.

Addresses CodeRabbit review feedback on idempotency gap.

* fix(sandbox): resolve sandbox user home dynamically when running as root

When the entrypoint runs as root, $HOME is /root — the proxy snippet
was written to /root/.bashrc instead of the sandbox user's home.
Use getent passwd to look up the sandbox user's home when running as
UID 0; fall back to /sandbox if the user entry is missing.

Addresses CodeRabbit review feedback on _SANDBOX_HOME resolution.

---------

Co-authored-by: Carlos Villela <cvillela@nvidia.com>
lakamsani pushed a commit to lakamsani/NemoClaw that referenced this pull request Apr 4, 2026
…ross reconnects (NVIDIA#1025)

* fix(sandbox): export proxy env vars with full NO_PROXY and persist across reconnects

OpenShell injects NO_PROXY=127.0.0.1,localhost,::1 into the sandbox, missing
inference.local and the gateway IP (10.200.0.1). This causes LLM inference
requests to route through the egress proxy instead of going direct, and the
proxy gateway IP itself gets proxied.

Add proxy configuration block to nemoclaw-start.sh that:
- Exports HTTP_PROXY, HTTPS_PROXY, and NO_PROXY with inference.local and
  the gateway IP included
- Persists via /etc/profile.d/nemoclaw-proxy.sh (root) or ~/.profile
  (non-root fallback) so values survive OpenShell reconnect injection
- Supports NEMOCLAW_PROXY_HOST / NEMOCLAW_PROXY_PORT overrides

The non-root fallback ensures the fix works in environments like Brev where
containers run without root privileges.

Tested on DGX Spark (ARM64) and Brev VM (x86_64). Verified NO_PROXY contains
inference.local and 10.200.0.1 inside the live sandbox after connect.

Ref: NVIDIA#626, NVIDIA#704
Ref: NVIDIA#704 (comment)

* fix(sandbox): write proxy config to ~/.bashrc for interactive reconnect sessions

OpenShell's `sandbox connect` spawns `/bin/bash -i` (interactive, non-login),
which sources ~/.bashrc — not ~/.profile or /etc/profile.d/*.  The previous
approach wrote to ~/.profile and /etc/profile.d/, neither of which is sourced
by `bash -i`, so the narrow OpenShell-injected NO_PROXY persisted in live
interactive sessions.

Changes:
- Write proxy snippet to ~/.bashrc (primary) and ~/.profile (login fallback)
- Export both uppercase and lowercase proxy variants (NO_PROXY + no_proxy,
  HTTP_PROXY + http_proxy, etc.) — Node.js undici prefers lowercase no_proxy
  over uppercase NO_PROXY when both are set
- Add idempotency guard to prevent duplicate blocks on container restart
- Update tests: verify .bashrc writing, idempotency, bash -i override
  behavior, and lowercase variant correctness

Tested on DGX Spark (ARM64) and Brev VM (x86_64) with full destroy +
re-onboard + live `env | grep proxy` verification inside the sandbox shell
via `openshell sandbox connect`.

Ref: NVIDIA#626

* fix(sandbox): replace stale proxy values on restart with begin/end markers

Use begin/end markers in .bashrc/.profile proxy snippet so
_write_proxy_snippet replaces the block when PROXY_HOST/PORT change
instead of silently keeping stale values. Adds test coverage for the
replacement path.

Addresses CodeRabbit review feedback on idempotency gap.

* fix(sandbox): resolve sandbox user home dynamically when running as root

When the entrypoint runs as root, $HOME is /root — the proxy snippet
was written to /root/.bashrc instead of the sandbox user's home.
Use getent passwd to look up the sandbox user's home when running as
UID 0; fall back to /sandbox if the user entry is missing.

Addresses CodeRabbit review feedback on _SANDBOX_HOME resolution.

---------

Co-authored-by: Carlos Villela <cvillela@nvidia.com>
gemini2026 pushed a commit to gemini2026/NemoClaw that referenced this pull request Apr 14, 2026
…ross reconnects (NVIDIA#1025)

* fix(sandbox): export proxy env vars with full NO_PROXY and persist across reconnects

OpenShell injects NO_PROXY=127.0.0.1,localhost,::1 into the sandbox, missing
inference.local and the gateway IP (10.200.0.1). This causes LLM inference
requests to route through the egress proxy instead of going direct, and the
proxy gateway IP itself gets proxied.

Add proxy configuration block to nemoclaw-start.sh that:
- Exports HTTP_PROXY, HTTPS_PROXY, and NO_PROXY with inference.local and
  the gateway IP included
- Persists via /etc/profile.d/nemoclaw-proxy.sh (root) or ~/.profile
  (non-root fallback) so values survive OpenShell reconnect injection
- Supports NEMOCLAW_PROXY_HOST / NEMOCLAW_PROXY_PORT overrides

The non-root fallback ensures the fix works in environments like Brev where
containers run without root privileges.

Tested on DGX Spark (ARM64) and Brev VM (x86_64). Verified NO_PROXY contains
inference.local and 10.200.0.1 inside the live sandbox after connect.

Ref: NVIDIA#626, NVIDIA#704
Ref: NVIDIA#704 (comment)

* fix(sandbox): write proxy config to ~/.bashrc for interactive reconnect sessions

OpenShell's `sandbox connect` spawns `/bin/bash -i` (interactive, non-login),
which sources ~/.bashrc — not ~/.profile or /etc/profile.d/*.  The previous
approach wrote to ~/.profile and /etc/profile.d/, neither of which is sourced
by `bash -i`, so the narrow OpenShell-injected NO_PROXY persisted in live
interactive sessions.

Changes:
- Write proxy snippet to ~/.bashrc (primary) and ~/.profile (login fallback)
- Export both uppercase and lowercase proxy variants (NO_PROXY + no_proxy,
  HTTP_PROXY + http_proxy, etc.) — Node.js undici prefers lowercase no_proxy
  over uppercase NO_PROXY when both are set
- Add idempotency guard to prevent duplicate blocks on container restart
- Update tests: verify .bashrc writing, idempotency, bash -i override
  behavior, and lowercase variant correctness

Tested on DGX Spark (ARM64) and Brev VM (x86_64) with full destroy +
re-onboard + live `env | grep proxy` verification inside the sandbox shell
via `openshell sandbox connect`.

Ref: NVIDIA#626

* fix(sandbox): replace stale proxy values on restart with begin/end markers

Use begin/end markers in .bashrc/.profile proxy snippet so
_write_proxy_snippet replaces the block when PROXY_HOST/PORT change
instead of silently keeping stale values. Adds test coverage for the
replacement path.

Addresses CodeRabbit review feedback on idempotency gap.

* fix(sandbox): resolve sandbox user home dynamically when running as root

When the entrypoint runs as root, $HOME is /root — the proxy snippet
was written to /root/.bashrc instead of the sandbox user's home.
Use getent passwd to look up the sandbox user's home when running as
UID 0; fall back to /sandbox if the user entry is missing.

Addresses CodeRabbit review feedback on _SANDBOX_HOME resolution.

---------

Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants