Skip to content

Improve skills based on an external benchmark for DevHub prompts#68

Open
pkosiec wants to merge 7 commits intomainfrom
pkosiec/report-compare
Open

Improve skills based on an external benchmark for DevHub prompts#68
pkosiec wants to merge 7 commits intomainfrom
pkosiec/report-compare

Conversation

@pkosiec
Copy link
Copy Markdown
Member

@pkosiec pkosiec commented May 7, 2026

Summary

Skill improvements based on the internal report — a benchmark of Claude Opus 4.6 vs Codex 5.2 across 25 DevHub templates. The report identified error recovery patterns, troubleshooting gaps, and factual errors in skills.

Report-driven additions

  • Token passthrough error — added to Common Errors with correct fix (workspace admin enables user authorization)
  • Lakehouse Sync — documented as UI-only, added REPLICA IDENTITY prerequisite, fixed Azure support status, added Postgres 17 requirement and limitations
  • Off-platform TypeScript — added @databricks/lakebase standalone pattern in connectivity.md
  • CLI fallback — added REST API exception for sandboxed environments
  • PostgreSQL extensions — brief note with link to official docs

Bug fixes (verified against AppKit source)

  • Lakebase API pattern — replaced outdated createLakebasePool() + pool.query() with AppKit.lakebase.query() plugin pattern
  • ORM integration — updated to use AppKit.lakebase.pool / getOrmConfig()
  • Genie resource — restored correct genie_space_name variable (confirmed via apps init output)
  • Model serving — replaced verbose streaming/AI Gateway sections with AppKit docs pointer; noted AI Gateway (beta) endpoints not directly supported

False positives from initial analysis (reverted)

  • Scaffolding "Go template bug" — intentional AppKit template design, not a bug
  • OBO vs SP-only guidance — misleading since apps init handles scopes correctly
  • Verify Deployment / Deployment Recovery sections — redundant (apps deploy already reports status)
  • CLI version bump to v0.296.0 — unjustified
  • Jobs/Pipelines troubleshooting tables — not report-driven
  • Unity Catalog skill — separate effort
  • databricks apps logs PAT error — actually about querying app endpoints, not the logs command

Test plan

  • python3 scripts/skills.py validate passes
  • No real credentials or workspace IDs in examples
  • Key claims verified against official Databricks docs and AppKit source

Co-authored-by: Isaac


JIRA

Related:

  • LKB-12465 — Cookbook codegen: agent generates wrong AppKit API signatures
  • LKB-12428 — Mode B: as unknown as double-assertions (Done)
  • LKB-12614 — AppKit version scatter
  • LKB-12159 — Umbrella regression hunt

pkosiec added 3 commits May 7, 2026 16:08
Add error recovery patterns, troubleshooting, and coverage gaps
identified by the April 2026 DevHub agent benchmark across 25 tasks.

Key additions:
- Token passthrough workaround and deployment recovery chain
- Off-platform TypeScript/Node.js patterns with REST API fallback
- pgvector, streaming AI chat, multi-space Genie patterns
- Lakehouse Sync UI-only note and REPLICA IDENTITY prerequisite
- Jobs and Pipelines troubleshooting tables
- CLI STOP directive carve-out for off-platform tasks

Co-authored-by: Isaac
- Revert scaffolding bug "Known Issue" (Go template syntax is
  intentional AppKit design, not a bug)
- Drop Unity Catalog skill (separate PR)
- Replace raw pg driver + REST API curl patterns with
  @databricks/lakebase package (standalone, auto token refresh)
- Replace detailed pgvector section with brief PostgreSQL
  extensions note linking to official docs

Co-authored-by: Isaac
- Replace outdated createLakebasePool() + pool.query() pattern with
  AppKit plugin pattern: AppKit.lakebase.query()
- Fix Genie databricks.yml: remove nonexistent `name` field and
  `genie_space_name` variable from genie_space resource
- Add missing user_api_scopes (files.files) for files plugin
- Improve model serving streaming docs (SSE proxy, not AI SDK)
- Bump CLI version to >= v0.296.0 across all 7 skills for consistency
- Add multi-environment deploy note in Lakebase scaffolding

Co-authored-by: Isaac
@pkosiec pkosiec changed the title Improve skills based on CAO pilot report findings Improve skills based on an external benchmark for DevHub prompts May 7, 2026
pkosiec added 4 commits May 7, 2026 17:20
- Revert redundant user_api_scopes in files.md (auto-generated by apps init)
- Restore genie_space_name variable (confirmed in actual scaffolding output)
- Replace multi-space Genie section with pointer to AppKit docs
- Revert Jobs and Pipelines troubleshooting tables (not report-driven)
- Revert CLI version bumps to original values
- Update connectivity.md cross-reference to match new plugin pattern

Co-authored-by: Isaac
- Clarify Lakebase plugin pattern requires scaffolding first
- Update ORM integration to use AppKit.lakebase.pool / getOrmConfig()
- Fix stale pool.query() references in synced tables section
- Replace streaming/AI Gateway sections with AppKit docs pointer
- Add AI Gateway note linking to official docs
- Remove app-focused Getting Started from databricks-core

Co-authored-by: Isaac
- Remove misleading OBO vs SP-only note (apps init handles scopes)
- Fix AI Gateway: note beta endpoints unsupported, point to
  databricks-model-serving skill instead of incompatible docs
- Fix CLI exception: link to REST API docs, not Lakebase skill
- Trim off-platform Pattern 5 to minimal example + npm view readme

Co-authored-by: Isaac
- Remove Verify Deployment + Deployment Recovery subsections (apps deploy
  already reports status; it's Option A, not a fallback)
- Remove duplicate file size error and incorrect apps logs PAT entry
- Fix token passthrough error: point to workspace admin enablement, not
  stripping OBO scopes
- Fix Lakehouse Sync: Azure now supported, add Postgres 17 requirement,
  destination naming, permissions, partitioned table limitation
- Simplify CLI exception: remove "does NOT require deploying" condition

Co-authored-by: Isaac
@pkosiec pkosiec requested a review from keugenek May 7, 2026 16:26
@pkosiec pkosiec marked this pull request as ready for review May 7, 2026 16:26
@pkosiec pkosiec requested review from a team, lennartkats-db and simonfaltum as code owners May 7, 2026 16:26
@keugenek
Copy link
Copy Markdown
Contributor

keugenek commented May 8, 2026

hey Pawel, how have you tested this - can you please attach test report?


Update smoke tests if headings or routes changed, then `databricks apps validate`.

For multi-space apps (switching between Genie spaces), see `npx @databricks/appkit docs ./docs/plugins/genie.md`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only for multi-space apps?

- **If the CLI is missing or outdated (< v0.292.0): STOP. Do not proceed or work around a missing CLI.**
- **Read the [CLI Installation](databricks-cli-install.md) reference file and follow the instructions to guide the user through installation.**
- Note: In sandboxed environments (Cursor IDE, containers), install commands write outside the workspace and may be blocked. Present the install command to the user and ask them to run it in their own terminal.
- **Exception:** If CLI installation is blocked (sandboxed containers, restricted environments), fall back to direct REST API calls using `DATABRICKS_HOST` and `DATABRICKS_TOKEN` environment variables if present in the shell. See the [Databricks REST API docs](https://docs.databricks.com/api/workspace/introduction).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow this might open a pandora box of many extra iterations - i'd ask user here what to do

Comment thread manifest.json
@@ -1,12 +1,12 @@
{
"version": "2",
"updated_at": "2026-04-30T11:02:41Z",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can these be autogenerated on each push? or removed entirely since they are unnessessery

Copy link
Copy Markdown
Contributor

@keugenek keugenek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved, with comments, check for eval run results before merging please

Copy link
Copy Markdown
Contributor

@keugenek keugenek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicker review — 3 perspectives (correctness / completeness / conflicts)

Triggered a dev eval run with skills_ref=pkosiec/report-compare to validate empirically: run 882501631168304. Results in ~60-90 min.

HIGH (must fix before merge)

1. AppKit.lakebase.query() / AppKit.server.router() — unverified API shapes (lakebase.md)

The PR replaces createLakebasePool() + pool.query() with AppKit.lakebase.query() and AppKit.server.router() / AppKit.server.procedure. Our own SKILL.md line 43 warns: "Training data has stale shapes; a single invented signature fails tsc --noEmit during validate."

The existing trpc.md uses initTRPC + t.router / t.procedure, not AppKit.server.*. If these aren't real exports, every generated Lakebase app will fail compilation. Can you attach a passing tsc --noEmit log or npx @databricks/appkit docs output confirming the new shapes?

The dev eval run above will also validate this empirically — if Lakebase apps fail tsc with the new skill text, we'll know.

2. Stale references not updatedoverview.md still says createLakebasePool, tRPC patterns; trpc.md still says provides createLakebasePool for PostgreSQL CRUD. These contradict the new Lakebase plugin API pattern in the same skill. Should be updated in this PR to avoid agent confusion.

MEDIUM (should fix)

3. Genie multi-space pointer (genie.md) — hardcodes npx @databricks/appkit docs ./docs/plugins/genie.md. The existing pattern in this file uses component-name lookups (npx @databricks/appkit docs "GenieChat") which are version-agnostic. Prefer the component-name form.

4. databricks-core/SKILL.md REST fallback — adds an exception to the existing "STOP — do not work around a missing CLI" guardrail. This reverses a deliberate safety rule without discussing the trade-offs (raw REST calls with PATs bypass workspace auth flows). Worth explicit discussion.

LOW

5. Lakehouse Sync prerequisite (synced-tables.md) — requirement that tables must reside in databricks_postgres database is significant but buried in prerequisites. Should be higher in the section.

Verdict

REQUEST CHANGES on HIGHs #1 and #2. The Lakebase API replacement is the load-bearing change and needs evidence that AppKit.server.router and AppKit.lakebase.query are real exports. Happy to approve once those are addressed. The dev eval will give us a data point either way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants