Classify OCI pull errors and fall back to git for skills content API#5014
Merged
Conversation
samuv
added a commit
that referenced
this pull request
Apr 22, 2026
Two issues reported by CI on #5014: - revive (content.go): drop the else after return in the git-fallback block and hoist the short declaration onto its own line. - unparam (registry.go): resolveGitFallbackForOCIRef never returns a non-nil error (search errors are swallowed as "no fallback available"), so narrow the signature to return just string and update the caller.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5014 +/- ##
==========================================
- Coverage 69.11% 69.09% -0.03%
==========================================
Files 554 555 +1
Lines 73176 73288 +112
==========================================
+ Hits 50574 50636 +62
- Misses 19589 19633 +44
- Partials 3013 3019 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
JAORMX
approved these changes
Apr 23, 2026
Today every 5xx response body collapses to http.StatusText(code), which is appropriate for 500 (may wrap DB drivers, container runtimes, connection strings) but hides actionable detail for 502/503/504 — which are explicitly about upstream failures the caller can act on. Branch the 5xx path: keep 500 generic, include err.Error() in the body for 502/503/504. Server-side slog/OTel/Sentry reporting is unchanged.
Registry.Pull failures in getContentFromOCI and installFromOCI were all mapped to a flat 502 Bad Gateway, so a 401 from GHCR, a missing manifest, a 429, and a network outage all looked identical to the caller. Add classifyPullError to inspect oras-go errors: context.DeadlineExceeded/Canceled → 504; *errcode.ErrorResponse 401/403 → 401, 404 → 404, 429 → 429, other 4xx/5xx → 502; errdef.ErrNotFound → 404; fallback → 502. Wire into content.go and install_oci.go at the Pull call sites. 4xx classifications flow through the unchanged 4xx handler path so auth/not-found errors surface verbatim without relying on the 5xx body change.
A content-preview request for an OCI ref that also appears in the skill registry with a git package was returned as a flat error as soon as the OCI pull failed, even though the catalog declared a perfectly good git mirror alongside the OCI artifact. Add resolveGitFallbackForOCIRef in registry.go. It searches the catalog by the refs tail path segment, keeps entries whose OCI package identifier parses to the same repository path (tag/digest agnostic, so :0.1.0 and :latest match), and returns the single matching entry's pinned git:// URL. Ambiguous matches are deliberately skipped — refusing to guess is better than silently serving the wrong skill. Wire into GetContent: for an unambiguous OCI ref whose remote pull fails, try the git fallback before returning the OCI error. The existing name-based registry resolution for ambiguous refs is kept intact.
Fixes A and B of this series widened the set of HTTP status codes both endpoints can return: 401 (registry auth), 404 (artifact missing), 429 (registry rate limit), and 504 (upstream pull timeout) on top of the existing 400/500/502. Add @failure annotations for the new codes on installSkill and getSkillContent in pkg/api/v1/skills.go and regenerate docs/server/{docs.go,swagger.json,swagger.yaml} via `swag init -g pkg/api/server.go --v3.1 -o docs/server --parseDependencyLevel 1`.
Two issues reported by CI on #5014: - revive (content.go): drop the else after return in the git-fallback block and hoist the short declaration onto its own line. - unparam (registry.go): resolveGitFallbackForOCIRef never returns a non-nil error (search errors are swallowed as "no fallback available"), so narrow the signature to return just string and update the caller.
This was referenced Apr 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
GET /api/v1beta/skills/content?ref=ghcr.io%2Fstacklok%2Fdockyard%2Fskills%2Fyara-rule-authoring%3A0.1.0returned a bare502 Bad Gatewaywith no body, even though the catalog entry foryara-rule-authoringalso lists a git package pinned to a commit on a public repo that would have served the same content.POST /api/v1beta/skillsandGET /api/v1beta/skills/contentnow advertise401,404,429, and504alongside the existing codes, anddocs/server/{docs.go,swagger.json,swagger.yaml}are regenerated to match.Fix A — surface upstream 5xx error messages
pkg/api/errors/handler.gopreviously collapsed every 5xx response body tohttp.StatusText(code). That's right for 500 (may wrap DB connection strings, container-runtime internals, etc.) but hides actionable detail for 502/503/504 — those are explicitly upstream failures the caller can act on.Branch the 5xx path: keep 500 generic, return
err.Error()for 502/503/504. Server-side slog/OTel/Sentry reporting is unchanged.Fix B — classify oras-go pull errors
Registry.Pullfailures in bothgetContentFromOCIandinstallFromOCIwere mapped to a flat502, so a 401 from GHCR looked the same as a network outage or a timeout. A newclassifyPullErrorhelper (pkg/skills/skillsvc/pull_errors.go) maps:context.DeadlineExceeded/context.Canceled504 Gateway Timeout*errcode.ErrorResponse401/403401 Unauthorized*errcode.ErrorResponse404404 Not Found*errcode.ErrorResponse429429 Too Many Requests*errcode.ErrorResponseother 4xx / 5xx502 Bad Gatewayerrdef.ErrNotFound404 Not Found502 Bad Gateway4xx classifications flow through the unchanged 4xx handler path, so auth / not-found errors surface verbatim without relying on the 5xx body change in Fix A.
Fix C — registry-declared git fallback
GetContentshort-circuited whenever the ref was "unambiguous OCI" (digest / explicit tag / multi-segment path), which is why the YARA case above never tried the git package declared on the same registry entry.A new
resolveGitFallbackForOCIRefinregistry.go:yara-rule-authoring).SkillLookup.SearchSkills(no interface change).:0.1.0and:latestmatch the same entry.git://host/owner/repo[@commit][#subfolder]URL.GetContentnow tries this fallback before returning the OCI error for unambiguous OCI refs. The existing name-based registry resolution for ambiguous refs (plain name / single-segment path) is kept intact.Flow after the change
flowchart TD start[GET /content?ref] --> classify{ref scheme} classify -->|git:// or https://| git[getContentFromGit] classify -->|OCI| oci[getContentFromOCI] oci -->|success| ok[200 with content] oci -->|failure| classifyErr[classifyPullError] classifyErr --> lookup[resolveGitFallbackForOCIRef] lookup -->|unique git package| gitFallback[getContentFromGit] gitFallback -->|success| ok gitFallback -->|failure| err["return wrapped error (401/404/429/502/504)"] lookup -->|no fallback| err classify -->|plain name| nameLookup[resolveFromRegistry] nameLookup --> oci nameLookup --> gitType of change
Test plan
go test ./pkg/skills/skillsvc/... -count=1go test ./pkg/api/errors/... -count=1go build ./...swag init -g pkg/api/server.go --v3.1 -o docs/server --parseDependencyLevel 1regenerated cleanly.TestClassifyPullErrortable-driven cases cover every branch of the classifier (context errors, each registry status code, wrapped errors, generic fallback).TestErrorHandlersubtests assert 500 stays generic while 502/503/504 now includeerr.Error().TestGetContentsubtests cover: fallback succeeds with pinned commit; fallback tolerates a different OCI version tag (:v9caller vs:0.1.0catalog); no git package → original OCI error; ambiguous catalog matches → no fallback; OCI success → registry lookup never invoked (verified via gomock with noSearchSkillsexpectation); registry lookup error → treated as no-fallback.Changes
pkg/api/errors/handler.goisUpstreamStatushelper; 502/503/504 responses now includeerr.Error()pkg/api/errors/handler_test.gopkg/skills/skillsvc/pull_errors.goclassifyPullError(err) intpkg/skills/skillsvc/pull_errors_test.gopkg/skills/skillsvc/content.goclassifyPullErrorat theregistry.Pullsite; dispatcher triesresolveGitFallbackForOCIRefbefore returning the OCI error for unambiguous OCI refspkg/skills/skillsvc/install_oci.goclassifyPullErrorat the install pull sitepkg/skills/skillsvc/registry.goresolveGitFallbackForOCIRef+canonicalOCIRepo/skillHasMatchingOCIRepo/firstGitPackageURL/preferredGitRefhelperspkg/skills/skillsvc/content_test.gopkg/api/v1/skills.go,docs/server/{docs.go,swagger.json,swagger.yaml}@Failureannotations for the new 401/404/429/504 responses oninstallSkillandgetSkillContent; regenerated OpenAPI specDoes this introduce a user-facing change?
Yes — two, both additive:
GET /api/v1beta/skills/contentorPOST /api/v1beta/skills/installthat previously saw502 Bad Gatewaywith an empty body now see401 Unauthorized/404 Not Found/429 Too Many Requests/504 Gateway Timeoutwhere appropriate, and the502body now includes the wrapped error for debugging.