Skip to content

feat(skills): package 12 Hugging Face skills#501

Merged
JAORMX merged 2 commits into
mainfrom
skills/huggingface
Apr 20, 2026
Merged

feat(skills): package 12 Hugging Face skills#501
JAORMX merged 2 commits into
mainfrom
skills/huggingface

Conversation

@JAORMX
Copy link
Copy Markdown
Collaborator

@JAORMX JAORMX commented Apr 20, 2026

Summary

Packages 12 Hugging Face platform skills from huggingface/skills (Apache-2.0) into Dockyard. All skills pinned to upstream commit 061ab49 (main as of 2026-04-16).

Fifth vendor in the per-vendor skills sweep.

Tracks #477.

Skills added

Tooling and CLIs

  • hf-cli — Hugging Face Hub CLI (hf) — replaces the deprecated huggingface-cli
  • huggingface-tool-builder — build reusable HF API scripts via hf CLI and curl/REST
  • hf-mcp — use the Hugging Face Hub via MCP server tools

Hub content

  • huggingface-datasets — Dataset Viewer API workflows + parquetlens SQL
  • huggingface-papers — read Hugging Face paper pages as markdown; papers API
  • huggingface-paper-publisher — create, link, and claim papers on the Hub
  • huggingface-trackio — experiment tracking with Trackio (logging, alerts, CLI)

Frameworks

  • huggingface-gradio — build Gradio web UIs and demos in Python
  • transformers-js — Transformers.js for browser/Node.js/Bun/Deno

Evaluation and training

  • huggingface-community-evals — local evaluations via inspect-ai or lighteval
  • huggingface-llm-trainer — TRL/Unsloth SFT/DPO/GRPO training on HF Jobs
  • huggingface-vision-trainer — vision model training on HF Jobs (object detection, classification, SAM/SAM2)

MCP server dependency

Three of these skills depend on the Hugging Face MCP server (hf_jobs, hf_whoami, hf_doc_search, hf_doc_fetch, etc.):

  • hf-mcp
  • huggingface-llm-trainer
  • huggingface-vision-trainer

Per skill-criteria.md, skills that depend on MCP servers are only eligible if the server is already in the ToolHive catalog. The HF MCP server is packaged under registries/official/servers/huggingface — the dependency is satisfied.

Security allowlists

All 12 skills carry MANIFEST_MISSING_LICENSE (INFO) — upstream is Apache-2.0 at the repo root, not as SPDX in per-skill SKILL.md frontmatter.

Additional targeted allowlists, each documented inline with justification in the corresponding spec.yaml:

  • hf-cliPIPELINE_TAINT_FLOW (LOW): curl | bash installers for hf and hf-mount (flagged by scanner as "instructional").
  • huggingface-paper-publisherDATA_EXFIL_NETWORK_REQUESTS, TOOL_ABUSE_UNDECLARED_NETWORK, FILE_MAGIC_MISMATCH (Handlebars paper template).
  • huggingface-tool-builderDATA_EXFIL_NETWORK_REQUESTS, TOOL_ABUSE_UNDECLARED_NETWORK, LOW_ANALYZABILITY (baseline HF API reference scripts).
  • huggingface-llm-trainerTOOL_ABUSE_UNDECLARED_NETWORK, SOCIAL_ENG_MISLEADING_DESC, TOOL_ABUSE_SYSTEM_PACKAGE_INSTALL (apt/yum in GGUF conversion script inside ephemeral HF Jobs containers), DATA_EXFIL_NETWORK_REQUESTS.
  • huggingface-vision-trainerTOOL_ABUSE_UNDECLARED_NETWORK, SOCIAL_ENG_MISLEADING_DESC, DATA_EXFIL_NETWORK_REQUESTS.

Test plan

  • task validate-skill on all 12 — all VALID
  • Cisco AI Defense skill-scanner 2.0.9 — all pass after allowlist
  • CI: Build Skill Artifacts workflow succeeds
  • CI: skill-scan-report surfaces only allowlisted findings
  • Post-merge: 12 OCI artifacts published under ghcr.io/stacklok/dockyard/skills/<name>:0.1.0

Closes #477

Packages 9 Hugging Face platform skills from huggingface/skills
(Apache-2.0) into Dockyard, pinned to upstream commit 061ab49 (main as
of 2026-04-16).

Fifth vendor in the per-vendor skills sweep.

Tooling and CLIs:
- hf-cli — Hub CLI (replaces huggingface-cli); auth, cache, buckets,
  repos, discussions, collections, jobs, endpoints, webhooks
- huggingface-tool-builder — build reusable HF API scripts with `hf`
  CLI + curl/REST

Hub content:
- huggingface-datasets — Dataset Viewer API + parquetlens SQL
- huggingface-papers — read paper pages as markdown, use papers API
- huggingface-paper-publisher — create/link/claim papers
- huggingface-trackio — experiment tracking (logging, alerts, CLI)

Frameworks:
- huggingface-gradio — Gradio UIs and demos in Python
- transformers-js — Transformers.js for browser/Node.js/Bun/Deno

Evaluation:
- huggingface-community-evals — local inspect-ai / lighteval evals

Skills intentionally excluded (MCP server dependency):

Per skill-criteria.md ("If a skill declares a dependency on one or
more MCP servers, every referenced MCP server must already be included
in the catalog"), three upstream skills are excluded from this PR
because they depend on the Hugging Face MCP server which is not yet
packaged in Dockyard:

- hf-mcp — entirely a guide to using HF MCP server tools
- huggingface-llm-trainer — mandates `hf_jobs()`, `hf_whoami()`,
  `hf_doc_search()`, `hf_doc_fetch()` MCP tool calls
- huggingface-vision-trainer — uses `hf_jobs()` and `hf_whoami()`
  MCP tools throughout

These can be added in a follow-up once the HF MCP server is
packaged in Dockyard (npx/uvx/go tree).

Security allowlists:
All 9 carry MANIFEST_MISSING_LICENSE — upstream Apache-2.0 at repo root,
not per-skill SPDX.

- hf-cli: PIPELINE_TAINT_FLOW — documented `curl | bash` installers
  for `hf` CLI and `hf-mount`; scanner flags them as 'instructional
  install text'.
- huggingface-paper-publisher: DATA_EXFIL_NETWORK_REQUESTS,
  TOOL_ABUSE_UNDECLARED_NETWORK (official HF API calls),
  FILE_MAGIC_MISMATCH (Handlebars-style paper template).
- huggingface-tool-builder: DATA_EXFIL_NETWORK_REQUESTS,
  TOOL_ABUSE_UNDECLARED_NETWORK, LOW_ANALYZABILITY — baseline
  reference scripts that call the public HF API (the skill teaches
  users to build such scripts).

All 9 pass `task validate-skill` and `task scan-skill`.

Refs #477
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 20, 2026

🛡️ Skill Security Scan Results

✅ hf-cli

  • Status: Passed
  • Findings: 3
  • Allowed (not blocking): 3
    • MANIFEST_MISSING_LICENSE (Allowed: huggingface/skills is licensed Apache-2.0 at the repository root; upstream does not embed an SPDX license identifier in per-skill SKILL.md frontmatter.)
    • PIPELINE_TAINT_FLOW (Allowed: The skill's prerequisites cite the official hf CLI installer (curl -LsSf https://hf.co/cli/install.sh | bash) and the hf-mount installer (curl -fsSL https://raw.githubusercontent.com/huggingface/hf-mount/main/install.sh | sh) as documented install commands. The scanner itself flags both as 'instructional install text in SKILL.md'.)
    • PIPELINE_TAINT_FLOW (Allowed: The skill's prerequisites cite the official hf CLI installer (curl -LsSf https://hf.co/cli/install.sh | bash) and the hf-mount installer (curl -fsSL https://raw.githubusercontent.com/huggingface/hf-mount/main/install.sh | sh) as documented install commands. The scanner itself flags both as 'instructional install text in SKILL.md'.)

✅ hf-mcp

  • Status: Passed
  • Findings: 1
  • Allowed (not blocking): 1
    • MANIFEST_MISSING_LICENSE (Allowed: huggingface/skills is licensed Apache-2.0 at the repository root; upstream does not embed an SPDX license identifier in per-skill SKILL.md frontmatter.)

✅ huggingface-community-evals

  • Status: Passed
  • Findings: 1
  • Allowed (not blocking): 1
    • MANIFEST_MISSING_LICENSE (Allowed: huggingface/skills is licensed Apache-2.0 at the repository root; upstream does not embed an SPDX license identifier in per-skill SKILL.md frontmatter.)

✅ huggingface-datasets

  • Status: Passed
  • Findings: 1
  • Allowed (not blocking): 1
    • MANIFEST_MISSING_LICENSE (Allowed: huggingface/skills is licensed Apache-2.0 at the repository root; upstream does not embed an SPDX license identifier in per-skill SKILL.md frontmatter.)

✅ huggingface-gradio

  • Status: Passed
  • Findings: 1
  • Allowed (not blocking): 1
    • MANIFEST_MISSING_LICENSE (Allowed: huggingface/skills is licensed Apache-2.0 at the repository root; upstream does not embed an SPDX license identifier in per-skill SKILL.md frontmatter.)

✅ huggingface-llm-trainer

  • Status: Passed
  • Findings: 11
  • Allowed (not blocking): 11
    • TOOL_ABUSE_UNDECLARED_NETWORK (Allowed: The skill orchestrates training jobs on Hugging Face Jobs cloud GPUs via the HF MCP server's hf_jobs tool. The network requirement is through the HF MCP server dependency (packaged in toolhive-catalog under registries/official/servers/huggingface), not a direct network-access tool in frontmatter.)
    • SOCIAL_ENG_MISLEADING_DESC (Allowed: Scanner heuristic flags the broad scope of the description (training SFT/DPO/GRPO + GGUF conversion + monitoring + etc.) as 'performing actions not reflected in description'. The description accurately reflects the skill's documented scope; the flag is a scanner conservatism false positive.)
    • TOOL_ABUSE_SYSTEM_PACKAGE_INSTALL (Allowed: The bundled scripts/convert_to_gguf.py references sudo apt-get install / sudo yum install for optional system packages (build tools) when converting trained models to GGUF format. These run in ephemeral HF Jobs containers, not on the user's host. The script is HF-authored and documented in SKILL.md.)
    • TOOL_ABUSE_SYSTEM_PACKAGE_INSTALL (Allowed: The bundled scripts/convert_to_gguf.py references sudo apt-get install / sudo yum install for optional system packages (build tools) when converting trained models to GGUF format. These run in ephemeral HF Jobs containers, not on the user's host. The script is HF-authored and documented in SKILL.md.)
    • TOOL_ABUSE_SYSTEM_PACKAGE_INSTALL (Allowed: The bundled scripts/convert_to_gguf.py references sudo apt-get install / sudo yum install for optional system packages (build tools) when converting trained models to GGUF format. These run in ephemeral HF Jobs containers, not on the user's host. The script is HF-authored and documented in SKILL.md.)
    • TOOL_ABUSE_SYSTEM_PACKAGE_INSTALL (Allowed: The bundled scripts/convert_to_gguf.py references sudo apt-get install / sudo yum install for optional system packages (build tools) when converting trained models to GGUF format. These run in ephemeral HF Jobs containers, not on the user's host. The script is HF-authored and documented in SKILL.md.)
    • TOOL_ABUSE_SYSTEM_PACKAGE_INSTALL (Allowed: The bundled scripts/convert_to_gguf.py references sudo apt-get install / sudo yum install for optional system packages (build tools) when converting trained models to GGUF format. These run in ephemeral HF Jobs containers, not on the user's host. The script is HF-authored and documented in SKILL.md.)
    • TOOL_ABUSE_SYSTEM_PACKAGE_INSTALL (Allowed: The bundled scripts/convert_to_gguf.py references sudo apt-get install / sudo yum install for optional system packages (build tools) when converting trained models to GGUF format. These run in ephemeral HF Jobs containers, not on the user's host. The script is HF-authored and documented in SKILL.md.)
    • DATA_EXFIL_NETWORK_REQUESTS (Allowed: Bundled helper scripts (scripts/dataset_inspector.py, scripts/hf_benchmarks.py) use urllib.request to query the public Hugging Face Hub API for dataset validation and benchmark lookups — documented workflow steps required by the skill.)
    • DATA_EXFIL_NETWORK_REQUESTS (Allowed: Bundled helper scripts (scripts/dataset_inspector.py, scripts/hf_benchmarks.py) use urllib.request to query the public Hugging Face Hub API for dataset validation and benchmark lookups — documented workflow steps required by the skill.)
    • DATA_EXFIL_NETWORK_REQUESTS (Allowed: Bundled helper scripts (scripts/dataset_inspector.py, scripts/hf_benchmarks.py) use urllib.request to query the public Hugging Face Hub API for dataset validation and benchmark lookups — documented workflow steps required by the skill.)

✅ huggingface-paper-publisher

  • Status: Passed
  • Findings: 6
  • Allowed (not blocking): 6
    • TOOL_ABUSE_UNDECLARED_NETWORK (Allowed: The skill uses network access through its bundled paper_manager.py script (as its documented workflow), but does not declare an explicit network-access tool in frontmatter. All network calls target the public Hugging Face Hub API documented in the SKILL.md.)
    • MANIFEST_MISSING_LICENSE (Allowed: huggingface/skills is licensed Apache-2.0 at the repository root; upstream does not embed an SPDX license identifier in per-skill SKILL.md frontmatter.)
    • DATA_EXFIL_NETWORK_REQUESTS (Allowed: scripts/paper_manager.py uses requests.get() to query the public Hugging Face Hub API (api.huggingface.co) for paper metadata — the skill's entire purpose. The destinations are the official HF API endpoints documented in the SKILL.md workflow.)
    • DATA_EXFIL_NETWORK_REQUESTS (Allowed: scripts/paper_manager.py uses requests.get() to query the public Hugging Face Hub API (api.huggingface.co) for paper metadata — the skill's entire purpose. The destinations are the official HF API endpoints documented in the SKILL.md workflow.)
    • DATA_EXFIL_NETWORK_REQUESTS (Allowed: scripts/paper_manager.py uses requests.get() to query the public Hugging Face Hub API (api.huggingface.co) for paper metadata — the skill's entire purpose. The destinations are the official HF API endpoints documented in the SKILL.md workflow.)
    • FILE_MAGIC_MISMATCH (Allowed: templates/modern.md is a paper template that legitimately uses Handlebars-style {{}} substitution syntax. Magika detects the Handlebars markers and flags the format mismatch; the file is plain text documentation and safe.)

✅ huggingface-papers

  • Status: Passed
  • Findings: 1
  • Allowed (not blocking): 1
    • MANIFEST_MISSING_LICENSE (Allowed: huggingface/skills is licensed Apache-2.0 at the repository root; upstream does not embed an SPDX license identifier in per-skill SKILL.md frontmatter.)

✅ huggingface-tool-builder

  • Status: Passed
  • Findings: 5
  • Allowed (not blocking): 5
    • LOW_ANALYZABILITY (Allowed: The scanner reports that 1 of 8 files is opaque — this is an executable .tsx reference script (references/baseline_hf_api.tsx) that the scanner cannot fully parse. The file is a small, HF-authored example script and is readable markdown-adjacent TypeScript.)
    • TOOL_ABUSE_UNDECLARED_NETWORK (Allowed: The skill uses network access through its bundled reference scripts that call the public Hugging Face Hub API. The frontmatter does not declare a dedicated network-access tool, but the network calls are documented examples bundled for user education, not runtime execution by the skill itself.)
    • MANIFEST_MISSING_LICENSE (Allowed: huggingface/skills is licensed Apache-2.0 at the repository root; upstream does not embed an SPDX license identifier in per-skill SKILL.md frontmatter.)
    • DATA_EXFIL_NETWORK_REQUESTS (Allowed: The skill's baseline reference scripts (references/baseline_hf_api.py) use urllib.request.Request/urlopen against the public Hugging Face Hub API (api.huggingface.co). Teaching users to build HF API-consuming tools is the skill's entire purpose.)
    • DATA_EXFIL_NETWORK_REQUESTS (Allowed: The skill's baseline reference scripts (references/baseline_hf_api.py) use urllib.request.Request/urlopen against the public Hugging Face Hub API (api.huggingface.co). Teaching users to build HF API-consuming tools is the skill's entire purpose.)

✅ huggingface-trackio

  • Status: Passed
  • Findings: 1
  • Allowed (not blocking): 1
    • MANIFEST_MISSING_LICENSE (Allowed: huggingface/skills is licensed Apache-2.0 at the repository root; upstream does not embed an SPDX license identifier in per-skill SKILL.md frontmatter.)

✅ huggingface-vision-trainer

  • Status: Passed
  • Findings: 4
  • Allowed (not blocking): 4
    • TOOL_ABUSE_UNDECLARED_NETWORK (Allowed: The skill orchestrates vision training jobs on Hugging Face Jobs cloud GPUs via the HF MCP server's hf_jobs tool. The network requirement is through the HF MCP server dependency (packaged in toolhive-catalog under registries/official/servers/huggingface), not a direct network-access tool in frontmatter.)
    • SOCIAL_ENG_MISLEADING_DESC (Allowed: Scanner heuristic flags the breadth of the description (object detection + image classification + SAM/SAM2 segmentation) as 'performing actions not reflected in description'. The description accurately reflects the skill's documented scope; the flag is a scanner conservatism false positive.)
    • MANIFEST_MISSING_LICENSE (Allowed: huggingface/skills is licensed Apache-2.0 at the repository root; upstream does not embed an SPDX license identifier in per-skill SKILL.md frontmatter.)
    • DATA_EXFIL_NETWORK_REQUESTS (Allowed: The bundled scripts/dataset_inspector.py uses urllib.request.urlopen() to query the public Hugging Face Hub API for dataset format validation — a documented workflow step required before launching GPU training.)

✅ transformers-js

  • Status: Passed
  • Findings: 0

Summary: Scanned 12 skill(s), all passed security checks. ✅

Adds the three HF skills initially excluded for MCP-server dependency:
- hf-mcp
- huggingface-llm-trainer
- huggingface-vision-trainer

These depend on the Hugging Face MCP server, which IS already packaged
in the toolhive-catalog (registries/official/servers/huggingface) —
the registry of record, which dockyard repackages from. The
skill-criteria.md "MCP server dependency" requirement is satisfied.

Total HF skills in this PR now: 12.

Per-skill allowlists added:
- huggingface-llm-trainer: TOOL_ABUSE_UNDECLARED_NETWORK (network via
  HF MCP `hf_jobs` tool), SOCIAL_ENG_MISLEADING_DESC (scanner
  conservatism on broad skill scope), TOOL_ABUSE_SYSTEM_PACKAGE_INSTALL
  (GGUF conversion script uses apt-get/yum inside ephemeral Jobs
  containers), DATA_EXFIL_NETWORK_REQUESTS (HF-API calls in dataset
  inspector and benchmarks helpers).
- huggingface-vision-trainer: same network/scope allowlists plus
  DATA_EXFIL_NETWORK_REQUESTS for dataset_inspector.py.
- hf-mcp: only MANIFEST_MISSING_LICENSE.

Refs #477
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JAORMX JAORMX changed the title feat(skills): package 9 Hugging Face skills feat(skills): package 12 Hugging Face skills Apr 20, 2026
@JAORMX JAORMX added the skills Skill packaging, vendor skill imports label Apr 20, 2026
@JAORMX JAORMX merged commit 4ef439b into main Apr 20, 2026
41 checks passed
@JAORMX JAORMX deleted the skills/huggingface branch April 20, 2026 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skills Skill packaging, vendor skill imports

Projects

None yet

Development

Successfully merging this pull request may close these issues.

skill: package huggingface/skills into dockyard

2 participants