Add Lakebox CLI for managing Databricks sandbox environments#4930
Draft
shuochen0311 wants to merge 16 commits intodatabricks:mainfrom
Draft
Add Lakebox CLI for managing Databricks sandbox environments#4930shuochen0311 wants to merge 16 commits intodatabricks:mainfrom
shuochen0311 wants to merge 16 commits intodatabricks:mainfrom
Conversation
Lakebox provides SSH-accessible development environments backed by microVM isolation. This adds CLI commands for lifecycle management: - `lakebox auth login` — authenticate to a Databricks workspace - `lakebox create` — create a new lakebox (with optional SSH public key) - `lakebox list` — list your lakeboxes (shows status, key hash, default) - `lakebox ssh` — SSH to your default lakebox (or create one on first use) - `lakebox status <id>` — show lakebox details - `lakebox delete <id>` — delete a lakebox - `lakebox set-default <id>` — change the default lakebox Features: - Default lakebox management stored at ~/.databricks/lakebox.json per profile - Automatic SSH config management (~/.ssh/config) - Public key auth only (password/keyboard-interactive disabled in SSH config) - Creates and sets default on first `lakebox ssh` if none exists
Contributor
Waiting for approvalBased on git history, these people are best suited to review:
Eligible reviewers: @andrewnester, @denik, @shreyas-goenka, @simonfaltum Suggestions based on git history. See OWNERS for ownership rules. |
- Remove PubkeyHashPrefix field from lakeboxEntry (no longer returned by API) - Remove KEY column from list output - Remove Key line from status output - Add register-key subcommand for SSH public key registration Co-authored-by: Isaac
…rites - Add 'register' command: generates ~/.ssh/lakebox_rsa and registers with API - Remove 'register-key' command (replaced by 'register') - Remove 'login' command (use 'auth login' + 'register' separately) - SSH command passes options directly as args instead of writing ~/.ssh/config - Check for ssh-keygen availability with helpful install instructions Co-authored-by: Isaac
- Hook into auth login PostRun to auto-generate ~/.ssh/lakebox_rsa and register it after OAuth completes - Fix hook: match on sub.Name() not sub.Use (Use includes args) - Export EnsureAndReadKey and RegisterKey for use by auth hook - Update help text Co-authored-by: Isaac
Everything after -- is passed directly to the ssh process, enabling: lakebox ssh -- echo hello # run command and return lakebox ssh <id> -- cat /etc/os-release lakebox ssh -- -L 8080:localhost:8080 # port forwarding Co-authored-by: Isaac
After 'lakebox auth login --host <url>', the post-login hook now constructs the workspace client directly from the --host/--profile flags instead of using MustWorkspaceClient (which started with an empty config and fell back to the DEFAULT profile). All lakebox commands now use a mustWorkspaceClient wrapper that reads the last-login profile from ~/.databricks/lakebox.json, so 'lakebox ssh' uses the correct profile without requiring --profile on every invocation. Also adds install.sh and upload.sh scripts.
Fix workspace client init after login, persist last profile
Merge kelvich's workspace client fix. Add -- passthrough support so extra args (remote commands, port forwarding, ssh flags) are passed directly to the ssh process. Co-authored-by: Isaac
Single cyan accent color throughout. Bold for IDs, dim for metadata. Braille spinner with elapsed time during async operations. - create: animated spinner during provisioning - list: aligned columns with colored status, cyan bold for running - status: clean field layout - delete: spinner during removal - ssh: spinner during connection - register: spinner during key registration - Shared ui.go with all primitives Co-authored-by: Isaac
The lakebox manager moved its REST surface to a proto-defined service with
JSON transcoding (databricks-eng/universe#1839855 + follow-ups). That
changed three things this CLI was depending on:
1. JSON field name: each Lakebox message now serializes as `lakeboxId`
(proto3 lowerCamelCase default), not `name`. List/status/create were
parsing into `Name string \`json:"name"\`` and silently getting the
empty string for every entry — the visible symptom was `lakebox list`
showing rows with blank ID columns.
2. Status codes: proto-transcoded handlers return 200 OK uniformly. The
CLI was checking 201 Created on POST /api/2.0/lakebox and 204
NoContent on DELETE, both of which now look like errors.
3. Key registration moved to its own top-level collection at
/api/2.0/lakebox-keys (was /api/2.0/lakebox/register-key), to avoid a
path collision with /api/2.0/lakebox/{lakebox_id}.
Drop the now-unused `extractLakeboxID` helper — the wire field is the
customer-facing ID directly.
Verified against dev-aws-us-west-2: list, status, create, delete all
work end-to-end. register hits a separate manager-side issue (stale
UserKey records in TiDB that the new schema can't deserialize) — not
fixed here.
Co-authored-by: Isaac
Reynold's restructure (databricks-eng/universe#1874214) nested the two
lakebox resources under the service namespace — moving sandboxes from
/api/2.0/lakebox to /api/2.0/lakebox/sandboxes and SSH keys from
/api/2.0/lakebox-keys to /api/2.0/lakebox/ssh-keys — and renamed the
resource type from Lakebox to Sandbox, which surfaces on the wire as
sandboxId / sandboxes (was lakeboxId / lakeboxes).
CLI still pointed at the old paths and decoded the old field names, so
list / status / create returned empty IDs and 404s. Fix both endpoint
constants, rename the request/response types and fields to match the
proto, and update the four call sites in create / list / ssh / status.
User-facing copy ("Lakebox …") is unchanged — the product is still
Lakebox; only the resource type renamed.
Verified end-to-end against dev-aws-us-west-2: create / list / status
/ delete all work; ssh passthrough works.
Co-authored-by: Isaac
Surfaces the new per-sandbox auto-stop knobs the manager added
(databricks-eng/universe#1875183) so users can see at a glance how long
their sandbox will live before the watchdog reaps it.
- `sandboxEntry` gains pointer fields `IdleTimeoutSecs` and `Persist` so
we keep the proto3 explicit-presence semantics ("not in response" vs
"explicitly set to 0 / false").
- `autoStopLabel()` collapses the policy to one short token:
- `persist == true` → `never`
- `idle_timeout_secs > 0` → compact duration (`90s`, `15m`, `2h`,
`1h30m`)
- otherwise → the manager's global default (10m), rendered
explicitly so the column never says `default`
- `lakebox list` adds an AUTOSTOP column between STATUS and DEFAULT.
- `lakebox status` adds an `autostop` field after `fqdn`.
Verified end-to-end against dev-aws-us-west-2 — list and status both
render `10m` for sandboxes with no per-record override.
Co-authored-by: Isaac
Surfaces the per-sandbox auto-stop knobs the manager added in
databricks-eng/universe#1875183 so users can flip them from the CLI
instead of curl + JSON.
lakebox config <id> --idle-timeout 15m # 15-minute timeout
lakebox config <id> --idle-timeout 1h30m # any Go duration
lakebox config <id> --idle-timeout 0 # clear → manager default
lakebox config <id> --persist # never auto-stop
lakebox config <id> --persist=false # back to timeout path
lakebox config <id> --idle-timeout 30m --persist=false # combined
Implementation notes:
- `updateBody` is the inner Sandbox sent in the PATCH body. The proto's
`(google.api.http)` declares `body: "sandbox"`, so the HTTP body is
the inner `Sandbox` message, NOT a `{"sandbox": {...}}` envelope.
First wired-up version got this wrong and the manager rejected with
"unknown field `sandbox`" — kept the type comment to flag the gotcha
for the next reader.
- `IdleTimeoutSecs` carries `,string` JSON tag because proto3 JSON
canonical form serializes int64 as a quoted string. The manager
accepts both bare-number and quoted-string on input but always
emits quoted on output, so without the tag we hit "cannot unmarshal
string into Go struct field … int64" on the response read-back.
- Pointer fields (`*int64`, `*bool`) carry proto3 explicit-presence
through to the wire — only the flags the user actually passed get
emitted, so a `--persist`-only invocation does not clobber an
existing idle_timeout (and vice-versa).
- Client-side range pre-flight (`[60s, 86400s]` plus the 0 clear
sentinel) mirrors the manager's `MIN_IDLE_TIMEOUT_SECS` /
`MAX_IDLE_TIMEOUT_SECS` constants so users get a clearer error
than the server's `INVALID_ARGUMENT`.
Verified end-to-end against dev-aws-us-west-2:
config --idle-timeout 15m → status shows `15m`
config --persist → status shows `never`
config --idle-timeout 0 --persist=false → status shows `10m`
Co-authored-by: Isaac
Tracks the matching rename in the lakebox manager
(databricks-eng/universe#1875183 follow-up). The manager-side flag
moved from `persist` to `no_autostop` because the original name
conflicted with the storage-persistence concept already in this
codebase.
CLI changes:
--persist → --no-autostop
--persist=false → --no-autostop=false
Plus a help-text note on the manager's new auto-clear behavior:
setting `--idle-timeout` to a non-zero value in a follow-up call
clears `--no-autostop` automatically, on the assumption that the
caller wants timeout-based stopping back. The CLI itself does not
need any extra logic for this — the manager handles it server-side
based on field presence in the PATCH body, and the CLI's existing
"omit unset flags from the wire payload" semantics (proto3
explicit-presence via *bool / *int64) feed straight into that.
Verified the marshal output matches what the new manager expects:
--no-autostop → {"sandbox_id":"x","no_autostop":true}
--idle-timeout 15m → {"sandbox_id":"x","idle_timeout_secs":"900"}
no flags → {"sandbox_id":"x"} (rejected)
End-to-end against staging blocked until the manager PR rolls out.
Co-authored-by: Isaac
Tracks the matching change in the lakebox manager (databricks-eng/universe#1875183) which moved the per-sandbox idle timeout off `optional int64 idle_timeout_secs = 7` and onto `optional google.protobuf.Duration idle_timeout = 7`. Drops the sentinel-overloaded int64 in favor of a duration-typed field. Wire shape: - Response field is now `idleTimeout` carrying a proto3-canonical Duration string (e.g. `"900s"`); parsed into seconds via `time.ParseDuration` for the autostop column. - Request body sends `idle_timeout` as the same string format. The CLI flag stays `--idle-timeout` (Go duration string in / Go duration string out); only the wire encoding changes. `list` and `status` show the manager's global default for any sandbox whose per-record value isn't yet visible under the new field name — that's deliberate forward-compat behavior so an older manager + newer CLI combination just degrades to showing the default rather than crashing. Co-authored-by: Isaac
- ssh: auto-pick uw2.s.dbrx.dev when the workspace host has `.staging.` in it, otherwise keep using prod uw2.dbrx.dev. `--gateway` still overrides. - api: when the workspace host carries a `?o=<id>` selector or the SDK config has a workspace_id, send `X-Databricks-Org-Id` so multi-workspace gateways (dogfood.staging.databricks.com) route the request to the right workspace. Without it the gateway rejects PATs with "Credential was not sent or was of an unsupported type for this API". Co-authored-by: Isaac
Contributor
|
An authorized user can trigger integration tests manually by following the instructions below: Trigger: Inputs:
Checks will be approved automatically on success. |
Draft
pietern
added a commit
that referenced
this pull request
May 8, 2026
…onments Brings in the original cmd/lakebox/* sources from #4930 with full commit-history attribution. Subsequent commits adapt the standalone CLI into a 'databricks lakebox' subcommand, replace hand-rolled HTTP/spinner/color plumbing with libs primitives, and add unit tests.
pietern
added a commit
that referenced
this pull request
May 8, 2026
Wire the cmd/lakebox tree from #4930 into the main CLI: - cmd/cmd.go registers lakebox.New() under the 'development' command group alongside bundle and sync. - cmd/fuzz_panic_test.go adds 'lakebox' to manualRoots so TestCountFuzz doesn't fuzz hand-written commands as if they were auto-generated. - cmd/lakebox tree: the original PR's standalone-CLI scaffolding is adapted for subcommand use — drop the auth-login hijacking and its helper exports, drop the 'last_profile' state field that only mattered when lakebox owned the whole CLI, switch PreRunE to root.MustWorkspaceClient directly, and update help text from 'lakebox foo' to 'databricks lakebox foo' throughout. Also conforms cmd/lakebox to project lint rules: env.UserHomeDir(ctx) in place of os.UserHomeDir, errors.Is(err, fs.ErrNotExist) instead of os.IsNotExist, atomic.Bool over sync.Once in the spinner gate, errors.New for static error strings. Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lakebox provides SSH-accessible development environments backed by microVM isolation. This adds CLI commands for lifecycle management.
Commands
lakebox auth login— authenticate to a Databricks workspacelakebox create— create a new lakebox (with optional SSH public key)lakebox list— list your lakeboxes (shows status, key hash, default)lakebox ssh— SSH to your default lakebox (or create one on first use)lakebox status <id>— show lakebox detailslakebox delete <id>— delete a lakeboxlakebox set-default <id>— change the default lakeboxFeatures
~/.databricks/lakebox.jsonper profile~/.ssh/configmanagement (single block, in-place update)lakebox sshif none exists/api/2.0/lakeboxon the workspace hostTest plan
dbsql-dev-testing-default.dev.databricks.comThis PR was created by the Lakebox team (Infra2.0/Brickvisor).