Skip to content

api: route per-call against unified hosts#5137

Merged
simonfaltum merged 20 commits intomainfrom
simonfaltum/cli-api-spog-routing
May 5, 2026
Merged

api: route per-call against unified hosts#5137
simonfaltum merged 20 commits intomainfrom
simonfaltum/cli-api-spog-routing

Conversation

@simonfaltum
Copy link
Copy Markdown
Member

@simonfaltum simonfaltum commented Apr 30, 2026

Why

databricks api always sent the workspace routing identifier (X-Databricks-Org-Id) when the profile had one, even when the path was an account API. On unified hosts (one host serving both workspace and account APIs) this misrouted account calls. There was also no way to explicitly route a call to the account API or override the identifier per call.

Changes

Before: routing was decided once from the profile and applied to every call.
Now: routing is decided per call from the path being requested.

  • Paths under /accounts/{id}/ are auto-detected as account-scope; the routing identifier is dropped.
  • A small hand-written list in cmd/api/paths.go carves out workspace-routed proxy APIs that happen to live under /accounts/, so they keep the identifier.
  • --account forces account-scope on a non-/accounts/ path.
  • --workspace-id <id> overrides the identifier per call. Mutually exclusive with --account.
  • ?o=<id> on the path (the SPOG URL convention used by the Databricks UI) is recognized as a per-call workspace override, so URLs pasted from the browser route correctly.
  • The CLI-only workspace_id = none sentinel is stripped before the routing decision so the literal "none" never goes on the wire.

Routing logic lives in pure functions (hasAccountSegment, extractOrgIDFromQuery, resolveOrgID, normalizeWorkspaceID, isWorkspaceProxyPath) that take primitives. The cobra RunE is a thin adapter that resolves config and calls them.

Test plan

  • go test ./cmd/api covers the helpers with table-driven cases: deny-list hits and misses, query/fragment edge cases, mutual-exclusion errors, sentinel stripping, ?o= extraction.
  • go test ./acceptance -run TestAccept/cmd/api exercises seven variants end to end against terraform and direct engines: workspace path, account path, deny-listed proxy under /accounts/, --account, --workspace-id, ?o= query, workspace_id = none. Each test asserts header presence/absence explicitly via print_requests.py | contains.py.
  • make checks

The next PR teaches `databricks api` to detect workspace-vs-account
scope per call. That decision needs a deny-list of paths under
accounts/ that the SDK builds without an account-ID slot (workspace
proxies). Hand-maintaining that list drifts from the SDK; this commit
generates it.

genpaths walks every service/*/impl.go in the pinned SDK with go/ast,
classifies each `path :=` assignment, and emits cmd/api/paths_generated.go
with a closed allowlist on account-ID source spellings. Refuses to emit
prefixes that would over-match, fails loudly on idioms it doesn't
recognize, handles var/define/assign forms and rejects compound
assignments. Hooked into ./task generate-paths and the existing
generated-files staleness gate in CI.

The generated tables are not yet referenced at runtime; the next PR
wires them in. Generated-file lint exclusions (lax rules) cover the
unused declarations until then.

Co-authored-by: Isaac
- Drop stdlib log in favor of fmt.Fprintf+os.Exit for the generator
  binary (depguard rule against the stdlib log package).
- Make the path-class switch exhaustive by listing the no-op cases.
- Lowercase the format-verb error string (staticcheck ST1005).

Co-authored-by: Isaac
@simonfaltum simonfaltum force-pushed the simonfaltum/cli-api-spog-routing branch from 6db6f9e to a775d99 Compare April 30, 2026 09:38
@simonfaltum simonfaltum changed the title api: make databricks api host-agnostic api: generate workspace-proxy deny-list from SDK source Apr 30, 2026
Keep the unified-host routing support small for the initial PR by hand-maintaining the current workspace-proxy exceptions instead of parsing generated SDK source.
`databricks api <verb> <path>` previously bypassed the generated SDK
header logic and called client.Do directly, so on unified hosts where
workspace-vs-account routing is decided per-call it had no way to
distinguish the two. This wires up per-call detection using the
deny-list from the prior PR, plus three explicit overrides:

  --account              scope this call to the account API
  --workspace-id <id>    override the workspace routing identifier
  {account_id}           literal substituted from the active profile

Detection runs on URL.Path so query strings and fragments can't
false-match. The CLI-only WorkspaceIDNone sentinel (workspace_id =
none in .databrickscfg) is normalized to empty before the SDK's
idiomatic check sees it, so the literal "none" never goes on the
wire.

Behavior change for classic workspace profiles that have workspace_id
set: the routing identifier is now sent. Classic gateways ignore the
header so this should be benign; called out in the manual smoke plan
in case it surfaces.

Co-authored-by: Isaac
Keep the manual workspace-proxy list behind one helper so tests exercise the same path used by runtime account detection.
@simonfaltum simonfaltum changed the title api: generate workspace-proxy deny-list from SDK source api: route per-call against unified hosts Apr 30, 2026
@simonfaltum simonfaltum marked this pull request as ready for review April 30, 2026 10:32
Add the standard OS and CI/cicd replacements to acceptance/cmd/api/test.toml
and regenerate the recorded User-Agent strings to use os/[OS]. Without these,
goldens generated locally on macOS contain os/darwin and no cicd/ segment,
which fail on Linux + GitHub Actions where the SDK records os/linux ... cicd/github.

Co-authored-by: Isaac
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 1, 2026 06:59 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 1, 2026 06:59 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 4, 2026 08:34 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 4, 2026 08:34 — with GitHub Actions Inactive
SPOG URLs from the Databricks UI carry the workspace ID as a query param
(e.g. /api/2.2/jobs/list?o=7474644166319138). Recognize that param when
present and use it as the routing identifier so pasted URLs route
correctly without requiring --workspace-id. Precedence: --account >
--workspace-id flag > ?o= > account-path auto-detect > profile
workspace_id.

Co-authored-by: Isaac
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 4, 2026 09:25 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 4, 2026 09:25 — with GitHub Actions Inactive
cmd/api scripts pass POSIX paths like /api/2.0/clusters/list to the CLI.
Git Bash on Windows converts a leading "/" arg to a Windows path, so the
CLI sees /Program Files/Git/api/2.0/... and the testserver returns 404.
Set MSYS_NO_PATHCONV=1 in the parent test.toml, matching the pattern used
by cmd/workspace/export-dir-* and workspace/repos.

Also regenerate out.test.toml with the diff-friendly inline format
introduced by #5146.

Co-authored-by: Isaac
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 4, 2026 11:59 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 4, 2026 11:59 — with GitHub Actions Inactive
Comment thread cmd/api/api.go Outdated
// correctly without requiring --workspace-id.
orgIDQueryParam = "o"

accountIDPlaceholder = "{account_id}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it belongs in a different PR. The current impl doesn't interpolate account IDs and we don't need to start doing that (yet). Might be useful, but then we should see if we should interpolate more.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair - I just wanted to make it easy to copy from the docs.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulled it out in the latest push (1ea4ffc) — {account_id} substitution and its acceptance tests are gone. Happy to do it as a focused follow-up if/when we want to expand path substitution more broadly.

Comment thread cmd/api/paths.go
"/api/2.0/preview/accounts/access-control/assignable-roles": {},
"/api/2.0/preview/accounts/access-control/rule-sets": {},
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The need for this list is unfortunate. What happens if we tack on the ?o= regardless?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or can we match on /accounts and /preview/accounts?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have the workspaceProxyPrefixes that makes that matching not possible afaik

Comment thread acceptance/cmd/api/account-flag/script Outdated
@@ -0,0 +1 @@
$CLI api get /api/2.0/clusters/list --account
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is that we don't see any org ID in the request log, correct?

If so, it's worth capturing in another call here that confirms that assertion.

Easy to miss the intent otherwise.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e9e0b4a. Each of the seven cmd/api acceptance tests now uses print_requests.py --get --keep | contains.py to assert the recorded request either does or does not carry X-Databricks-Org-Id (with a specific value where one is expected). The intent is now visible in the script and in output.txt, not implicit in out.requests.txt.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now dups the requests between out.requests.txt and output.txt. Can you keep one of them?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped --keep in 556b552print_requests.py now consumes out.requests.txt (deleted from the test directories) and the recorded request lives in output.txt only.

Pull this out of the per-call routing PR per review feedback. The routing
change does not need account-ID interpolation to land; if we want it later
it can be a focused follow-up alongside any other path-substitution we
decide to support.

Co-authored-by: Isaac
…citly

Each test now uses print_requests.py | contains.py to assert the
recorded request either does or does not carry the routing identifier.
The intent (header sent or not, with what value) is now visible in the
script and in output.txt, instead of being implied by the recorded
out.requests.txt artifact alone.

Co-authored-by: Isaac
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 4, 2026 13:39 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 4, 2026 13:39 — with GitHub Actions Inactive
Comment thread acceptance/cmd/api/account-flag/script Outdated
@@ -0,0 +1 @@
$CLI api get /api/2.0/clusters/list --account
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now dups the requests between out.requests.txt and output.txt. Can you keep one of them?

@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 4, 2026 14:38 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 4, 2026 14:38 — with GitHub Actions Inactive
print_requests.py --keep was preserving out.requests.txt alongside the
new inline assertion in output.txt, duplicating the recorded request.
Drop --keep so the helper consumes out.requests.txt and the assertion
stays in output.txt only.

Co-authored-by: Isaac
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 4, 2026 14:41 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 4, 2026 14:41 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 5, 2026 09:01 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 5, 2026 09:01 — with GitHub Actions Inactive
Setting MSYS_NO_PATHCONV=1 in the parent test.toml [Env] block fixed Git
Bash rewriting /api/... to a Windows path, but it also broke the python
helpers invoked on the next line (print_requests.py, contains.py): with
the variable set globally, Git Bash passes script paths in MSYS form to
python.exe, which then can't open them.

Drop the [Env] block and prefix MSYS_NO_PATHCONV=1 inline on each
\$CLI api ... call, matching the pattern already used by envsubst() in
acceptance/script.prepare.

Co-authored-by: Isaac
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 5, 2026 09:08 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 5, 2026 09:08 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 5, 2026 11:11 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 5, 2026 11:11 — with GitHub Actions Inactive
@simonfaltum simonfaltum merged commit 3c867c9 into main May 5, 2026
52 of 57 checks passed
@simonfaltum simonfaltum deleted the simonfaltum/cli-api-spog-routing branch May 5, 2026 14:34
jamesbroadhead added a commit that referenced this pull request May 5, 2026
main removed AuthArguments.IsUnifiedHost in favor of DiscoveryURL-based
routing (#5137). Update TestBuildLoginCommand_AppendsWorkspaceID's unified-host
case to use a discovery URL containing /oidc/accounts/<id>/ so the test still
exercises UnifiedOAuthArgument.

Co-authored-by: Isaac
pietern added a commit that referenced this pull request May 8, 2026
Drop the bespoke resolveWorkspaceID helper and the cached wsID field on
lakeboxAPI. Match the minimal pattern that libs/telemetry, libs/filer,
and SDK-generated workspace services already use: read cfg.WorkspaceID
directly, send the X-Databricks-Org-Id header if set.

Removes the '?o=<id>' fallback that parsed the host's query string.
That behavior was unique to lakebox and inconsistent with how every
other CLI surface handles SPOG hosts; the SDK's host-metadata discovery
populates cfg.WorkspaceID for hosts that need it, and users who run
into edge cases set workspace_id explicitly the same way they would
for `bundle deploy` or `databricks api`.

Adds the auth.WorkspaceIDNone ("none") sentinel strip so a profile
created via `databricks auth login` for SPOG account-level access
doesn't send the literal string "none" as the routing identifier.
This fix matches what cmd/api/api.go (#5137) and libs/auth do; the
four other orgIDHeaders helpers in the codebase still have the latent
bug, which is a separate cleanup.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants