Skip to content

Classify packager errors as 400 in skill build handler#5076

Merged
JAORMX merged 2 commits into
mainfrom
improve-build-errors
Apr 28, 2026
Merged

Classify packager errors as 400 in skill build handler#5076
JAORMX merged 2 commits into
mainfrom
improve-build-errors

Conversation

@samuv
Copy link
Copy Markdown
Contributor

@samuv samuv commented Apr 27, 2026

Summary

  • Follow-up on the same Sentry-noise / unhelpful-500 pattern fixed for the content/install paths in Classify OCI pull errors and fall back to git for skills content API #5014 and Return 502 Bad Gateway when OCI skill pull fails #4956 — now applied to POST /api/v1beta/skills/build. A user pointing thv skill build at a directory without a SKILL.md, or with malformed frontmatter, currently gets a bare 500 Internal Server Error and a *fmt.wrapError: packaging skill: reading skill directory: SKILL.md not found in skill directory exception in Sentry. That's plainly bad input, not a server bug.
  • toolhive-core v0.0.17 (release, stacklok/toolhive-core#95) introduces six sentinel errors in oci/skills so the packager's user-input failures can be classified with errors.Is instead of matching error strings.
  • Bump the dependency and wrap s.packager.Package with an errors.Is switch over those sentinels, mapping each one to 400 Bad Request with the packager's message preserved (e.g. SKILL.md missing, invalid SKILL.md frontmatter, skill directory too large). Anything else stays a real 500.
  • Two independent commits so the dependency bump and the logic change can be reviewed / reverted in isolation.

Why this and not a ValidateSkillDir pre-flight

An earlier draft pre-flighted the existing skills.ValidateSkillDir validator before invoking the packager. That worked for the most common case (missing SKILL.md) but had two problems:

  • Duplicate filesystem walk on every successful build (validator + packager both walk the directory).
  • Validator drift: the toolhive-side validator does not enforce the packager's maxSkillFiles (1,000) and maxSkillTotalSize (100 MB) limits, nor the per-file validateSkillFile checks (symlinks, non-regular files). Those failure modes still fell through to 500.

The sentinel switch covers every packager failure mode in one place, with no double-validation. The packager's existing error messages are preserved verbatim, so the client still sees actionable detail in the 400 body.

Mapping

Sentinel (ociskills.*) Cause Status
ErrSkillMDMissing SKILL.md not present in the directory 400
ErrInvalidFrontmatter Malformed YAML, missing delimiter, oversized, empty name 400
ErrInvalidSkillDir Directory missing / not-a-dir / contains traversal 400
ErrInvalidSkillFile Symlink or non-regular file inside the dir 400
ErrTooManyFiles Exceeds maxSkillFiles (1,000) 400
ErrSkillTooLarge Exceeds maxSkillTotalSize (100 MB) 400
anything else Unclassified packager failure 500

The pre-existing validateLocalPath checks (null bytes, traversal, relative paths) stay in place — they're cheap, run before any I/O, and double as a guard against packager bugs.

Type of change

  • Bug fix (HTTP status semantics + observability of build failures)

Test plan

  • Unit tests (go test ./pkg/skills/skillsvc/... -count=1)
  • Linting (task lint-fix — 0 issues)
  • Wider regression sweep (go test ./pkg/skills/... ./pkg/api/...)
  • Five new TestBuild cases assert both httperr.Code(err) == 400 and errors.Is(err, ociskills.ErrXxx) for each sentinel: missing SKILL.md, invalid frontmatter, empty name in frontmatter, symlink in skill dir, oversized dir.
  • Existing cases (nil packager returns 500, empty path returns 400, relative path returns 400, path traversal returns 400, invalid tag returns 400, packager error propagates, three success cases, invalid fallback config name returns 400) all still pass — none of them flow through the new switch.

Changes

File Change
go.mod, go.sum Bump github.com/stacklok/toolhive-core v0.0.16v0.0.17
pkg/skills/skillsvc/build.go errors.Is switch over the six packager sentinels; each → httperr.WithCode(err, http.StatusBadRequest)
pkg/skills/skillsvc/build_test.go Add wantErrIs error field + five new sentinel cases; runner asserts assert.ErrorIs when set

The OpenAPI annotation on buildSkill (pkg/api/v1/skills.go) already advertises @Failure 400 ..., and docs/server/{docs.go,swagger.json,swagger.yaml} already reflect it — no spec regeneration needed.

Does this introduce a user-facing change?

Yes — clients calling POST /api/v1beta/skills/build on a directory the packager rejects (missing SKILL.md, malformed frontmatter, symlinks, size/count limit) now receive 400 Bad Request with the packager's message in the body, instead of 500 Internal Server Error with an empty body. Successful builds and genuine server failures are unchanged.

samuv added 2 commits April 27, 2026 17:28
v0.0.17 introduces sentinel errors in oci/skills (ErrSkillMDMissing,
ErrInvalidFrontmatter, ErrInvalidSkillDir, ErrInvalidSkillFile,
ErrTooManyFiles, ErrSkillTooLarge) so callers can classify packager
failures with errors.Is instead of matching error strings. Pulled in
ahead of the build-handler refactor that consumes them.
POST /api/v1beta/skills/build previously surfaced every packager
failure — including user-input mistakes like a missing SKILL.md or a
malformed frontmatter — as 500 Internal Server Error with no body,
generating Sentry noise on what is plainly bad input.

Wrap the s.packager.Package error with an errors.Is switch over the
six toolhive-core v0.0.17 sentinels and map each one to 400 with the
packager's message preserved. Anything else stays a 500.

Add five TestBuild cases that mock the packager returning a wrapped
sentinel and assert both the HTTP code and errors.Is on the sentinel,
covering the limits the toolhive-side validator misses (symlink in
skill dir, oversized dir).
@samuv samuv requested a review from JAORMX as a code owner April 27, 2026 15:29
@samuv samuv self-assigned this Apr 27, 2026
@github-actions github-actions Bot added the size/S Small PR: 100-299 lines changed label Apr 27, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.77%. Comparing base (68d29fb) to head (aabec78).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5076      +/-   ##
==========================================
- Coverage   69.81%   69.77%   -0.04%     
==========================================
  Files         562      564       +2     
  Lines       56647    56652       +5     
==========================================
- Hits        39548    39531      -17     
- Misses      14063    14092      +29     
+ Partials     3036     3029       -7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JAORMX JAORMX merged commit cb03892 into main Apr 28, 2026
43 checks passed
@JAORMX JAORMX deleted the improve-build-errors branch April 28, 2026 07:53
samuv added a commit to stacklok/toolhive-studio that referenced this pull request Apr 28, 2026
…ral toast (#2123)

* test(mocks): register user-error scenario for 400 backend errors

Add a generic 'user-error' entry to MockScenarios so fixtures can model the 400 Bad Request shape introduced for the skill-build endpoint by stacklok/toolhive#5076, and wire the first instance on POST /api/v1beta/skills/build returning '{ error: "SKILL.md missing" }'.

* feat(skills): persist build dialog errors inline instead of an ephemeral toast

Render a destructive Alert below the dialog header that surfaces the backend message verbatim — including the actionable 400 packager errors from stacklok/toolhive#5076 (SKILL.md missing, invalid SKILL.md frontmatter, symlink/size violations) — so the user can keep reading it while they fix the path or tag. The alert clears on cancel and on resubmit. Mirrors the submitError pattern already used by DialogInstallSkill.

* refactor(skills): drop redundant error toast from build mutation hook

The build dialog now owns the user-visible error surface with an inline alert that preserves the backend message, so the hook's generic 'Failed to build skill' toast was duplicating the signal and hiding the more useful packager output. trackEvent('Skills: build failed') is preserved — analytics are unchanged. Matches useMutationInstallSkill, which already lets its dialog render the error inline.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/S Small PR: 100-299 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants