Improve skills based on an external benchmark for DevHub prompts#68
Improve skills based on an external benchmark for DevHub prompts#68
Conversation
Add error recovery patterns, troubleshooting, and coverage gaps identified by the April 2026 DevHub agent benchmark across 25 tasks. Key additions: - Token passthrough workaround and deployment recovery chain - Off-platform TypeScript/Node.js patterns with REST API fallback - pgvector, streaming AI chat, multi-space Genie patterns - Lakehouse Sync UI-only note and REPLICA IDENTITY prerequisite - Jobs and Pipelines troubleshooting tables - CLI STOP directive carve-out for off-platform tasks Co-authored-by: Isaac
- Revert scaffolding bug "Known Issue" (Go template syntax is intentional AppKit design, not a bug) - Drop Unity Catalog skill (separate PR) - Replace raw pg driver + REST API curl patterns with @databricks/lakebase package (standalone, auto token refresh) - Replace detailed pgvector section with brief PostgreSQL extensions note linking to official docs Co-authored-by: Isaac
- Replace outdated createLakebasePool() + pool.query() pattern with AppKit plugin pattern: AppKit.lakebase.query() - Fix Genie databricks.yml: remove nonexistent `name` field and `genie_space_name` variable from genie_space resource - Add missing user_api_scopes (files.files) for files plugin - Improve model serving streaming docs (SSE proxy, not AI SDK) - Bump CLI version to >= v0.296.0 across all 7 skills for consistency - Add multi-environment deploy note in Lakebase scaffolding Co-authored-by: Isaac
- Revert redundant user_api_scopes in files.md (auto-generated by apps init) - Restore genie_space_name variable (confirmed in actual scaffolding output) - Replace multi-space Genie section with pointer to AppKit docs - Revert Jobs and Pipelines troubleshooting tables (not report-driven) - Revert CLI version bumps to original values - Update connectivity.md cross-reference to match new plugin pattern Co-authored-by: Isaac
- Clarify Lakebase plugin pattern requires scaffolding first - Update ORM integration to use AppKit.lakebase.pool / getOrmConfig() - Fix stale pool.query() references in synced tables section - Replace streaming/AI Gateway sections with AppKit docs pointer - Add AI Gateway note linking to official docs - Remove app-focused Getting Started from databricks-core Co-authored-by: Isaac
- Remove misleading OBO vs SP-only note (apps init handles scopes) - Fix AI Gateway: note beta endpoints unsupported, point to databricks-model-serving skill instead of incompatible docs - Fix CLI exception: link to REST API docs, not Lakebase skill - Trim off-platform Pattern 5 to minimal example + npm view readme Co-authored-by: Isaac
- Remove Verify Deployment + Deployment Recovery subsections (apps deploy already reports status; it's Option A, not a fallback) - Remove duplicate file size error and incorrect apps logs PAT entry - Fix token passthrough error: point to workspace admin enablement, not stripping OBO scopes - Fix Lakehouse Sync: Azure now supported, add Postgres 17 requirement, destination naming, permissions, partitioned table limitation - Simplify CLI exception: remove "does NOT require deploying" condition Co-authored-by: Isaac
|
hey Pawel, how have you tested this - can you please attach test report? |
|
|
||
| Update smoke tests if headings or routes changed, then `databricks apps validate`. | ||
|
|
||
| For multi-space apps (switching between Genie spaces), see `npx @databricks/appkit docs ./docs/plugins/genie.md`. |
There was a problem hiding this comment.
why only for multi-space apps?
| - **If the CLI is missing or outdated (< v0.292.0): STOP. Do not proceed or work around a missing CLI.** | ||
| - **Read the [CLI Installation](databricks-cli-install.md) reference file and follow the instructions to guide the user through installation.** | ||
| - Note: In sandboxed environments (Cursor IDE, containers), install commands write outside the workspace and may be blocked. Present the install command to the user and ask them to run it in their own terminal. | ||
| - **Exception:** If CLI installation is blocked (sandboxed containers, restricted environments), fall back to direct REST API calls using `DATABRICKS_HOST` and `DATABRICKS_TOKEN` environment variables if present in the shell. See the [Databricks REST API docs](https://docs.databricks.com/api/workspace/introduction). |
There was a problem hiding this comment.
wow this might open a pandora box of many extra iterations - i'd ask user here what to do
| @@ -1,12 +1,12 @@ | |||
| { | |||
| "version": "2", | |||
| "updated_at": "2026-04-30T11:02:41Z", | |||
There was a problem hiding this comment.
can these be autogenerated on each push? or removed entirely since they are unnessessery
keugenek
left a comment
There was a problem hiding this comment.
approved, with comments, check for eval run results before merging please
keugenek
left a comment
There was a problem hiding this comment.
Nitpicker review — 3 perspectives (correctness / completeness / conflicts)
Triggered a dev eval run with skills_ref=pkosiec/report-compare to validate empirically: run 882501631168304. Results in ~60-90 min.
HIGH (must fix before merge)
1. AppKit.lakebase.query() / AppKit.server.router() — unverified API shapes (lakebase.md)
The PR replaces createLakebasePool() + pool.query() with AppKit.lakebase.query() and AppKit.server.router() / AppKit.server.procedure. Our own SKILL.md line 43 warns: "Training data has stale shapes; a single invented signature fails tsc --noEmit during validate."
The existing trpc.md uses initTRPC + t.router / t.procedure, not AppKit.server.*. If these aren't real exports, every generated Lakebase app will fail compilation. Can you attach a passing tsc --noEmit log or npx @databricks/appkit docs output confirming the new shapes?
The dev eval run above will also validate this empirically — if Lakebase apps fail tsc with the new skill text, we'll know.
2. Stale references not updated — overview.md still says createLakebasePool, tRPC patterns; trpc.md still says provides createLakebasePool for PostgreSQL CRUD. These contradict the new Lakebase plugin API pattern in the same skill. Should be updated in this PR to avoid agent confusion.
MEDIUM (should fix)
3. Genie multi-space pointer (genie.md) — hardcodes npx @databricks/appkit docs ./docs/plugins/genie.md. The existing pattern in this file uses component-name lookups (npx @databricks/appkit docs "GenieChat") which are version-agnostic. Prefer the component-name form.
4. databricks-core/SKILL.md REST fallback — adds an exception to the existing "STOP — do not work around a missing CLI" guardrail. This reverses a deliberate safety rule without discussing the trade-offs (raw REST calls with PATs bypass workspace auth flows). Worth explicit discussion.
LOW
5. Lakehouse Sync prerequisite (synced-tables.md) — requirement that tables must reside in databricks_postgres database is significant but buried in prerequisites. Should be higher in the section.
Verdict
REQUEST CHANGES on HIGHs #1 and #2. The Lakebase API replacement is the load-bearing change and needs evidence that AppKit.server.router and AppKit.lakebase.query are real exports. Happy to approve once those are addressed. The dev eval will give us a data point either way.
Summary
Skill improvements based on the internal report — a benchmark of Claude Opus 4.6 vs Codex 5.2 across 25 DevHub templates. The report identified error recovery patterns, troubleshooting gaps, and factual errors in skills.
Report-driven additions
@databricks/lakebasestandalone pattern in connectivity.mdBug fixes (verified against AppKit source)
createLakebasePool()+pool.query()withAppKit.lakebase.query()plugin patternAppKit.lakebase.pool/getOrmConfig()genie_space_namevariable (confirmed viaapps initoutput)False positives from initial analysis (reverted)
apps inithandles scopes correctlyapps deployalready reports status)databricks apps logsPAT error — actually about querying app endpoints, not the logs commandTest plan
python3 scripts/skills.py validatepassesCo-authored-by: Isaac
JIRA
Related:
as unknown asdouble-assertions (Done)