Skip to content

datafusion-contrib/datafusion-skills

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

datafusion-skills

A Claude Code plugin that adds Apache DataFusion-powered skills for data exploration, querying, and materialized views.

Installation

From GitHub

Add the repository as a plugin source and install:

/plugin marketplace add datafusion-contrib/datafusion-skills
/plugin install datafusion-skills@datafusion-skills

This registers the GitHub repo as a marketplace and installs the plugin. Skills will be available as /datafusion-skills:<skill-name> in all future sessions.

Updating

/plugin marketplace update datafusion-skills
/plugin update datafusion-skills@datafusion-skills

Skills

query

Run SQL queries against registered tables or ad-hoc against files. Accepts raw SQL or natural language questions. Supports Parquet, CSV, JSON, Arrow IPC, and Avro.

/datafusion-skills:query SELECT * FROM 'trades.parquet' WHERE symbol = 'AAPL' LIMIT 10
/datafusion-skills:query "what are the top 5 symbols by volume?"
/datafusion-skills:query FROM sales WHERE amount > 100

read-file

Read and explore any data file — Parquet, CSV, JSON, Arrow IPC, Avro — locally or from S3/GCS. Auto-detects format by extension.

/datafusion-skills:read-file trades.parquet what columns does it have?
/datafusion-skills:read-file s3://my-bucket/data.parquet describe the schema
/datafusion-skills:read-file metrics.csv how many rows?

create-table

Register a data file as a persistent external table. Explores the schema and persists the registration so all other skills can access the table automatically.

/datafusion-skills:create-table trades.parquet
/datafusion-skills:create-table data.csv --name sales --format csv

materialized-view

Create and manage materialized views — persist SQL query results as Parquet files for fast repeated access. Track source dependencies and refresh when data changes.

/datafusion-skills:materialized-view "create a daily summary of trades grouped by symbol"
/datafusion-skills:materialized-view refresh trades_daily
/datafusion-skills:materialized-view status
/datafusion-skills:materialized-view list

explain-plan

Visualize and analyze query execution plans. Identifies performance bottlenecks and suggests optimizations.

/datafusion-skills:explain-plan SELECT * FROM trades WHERE date > '2024-01-01'
/datafusion-skills:explain-plan --analyze SELECT COUNT(*) FROM large_table GROUP BY category

datafusion-docs

Search Apache DataFusion documentation — user guide, SQL reference, and API docs. Returns relevant documentation for a question or keyword.

/datafusion-skills:datafusion-docs window functions
/datafusion-skills:datafusion-docs "how do I create an external table?"
/datafusion-skills:datafusion-docs APPROX_PERCENTILE_CONT

install-datafusion

Install or update datafusion-cli. Supports Homebrew, cargo install, and pre-built binaries.

/datafusion-skills:install-datafusion
/datafusion-skills:install-datafusion --update

Session state

All skills share a single state.sql file per project — a plain SQL file containing CREATE EXTERNAL TABLE statements and configuration. When state is first needed, you'll be asked where to store it:

  1. In the project directory (.datafusion-skills/state.sql) — colocated with the project, optionally gitignored
  2. In your home directory (~/.datafusion-skills/<project>/state.sql) — keeps the repo clean

Any skill restores the session via datafusion-cli --file state.sql.

How the skills work together

Skills reference each other where it makes sense:

  • read-file suggests query for follow-up exploration and create-table for persisting data
  • query uses session state from create-table automatically
  • materialized-view creates persistent Parquet files registered via create-table
  • explain-plan helps optimize queries from query
  • All skills use datafusion-docs to troubleshoot DataFusion errors automatically

Why DataFusion?

Apache DataFusion is a fast, extensible query engine built in Rust on top of Apache Arrow. It offers:

  • High performance: Vectorized execution, predicate pushdown, partition pruning
  • Standard SQL: Full SQL support including window functions, CTEs, subqueries
  • Extensibility: Custom table providers, UDFs, optimizer rules
  • File format support: Parquet, CSV, JSON, Arrow IPC, Avro
  • Cloud native: S3, GCS, Azure object store support
  • Materialized views: Persist query results and track dependencies (unique to DataFusion ecosystem)

Local development

# Clone the repo
git clone https://github.com/datafusion-contrib/datafusion-skills.git
cd datafusion-skills

# Launch Claude Code with the local plugin directory
claude --plugin-dir .

Test individual skills:

/datafusion-skills:read-file some_local_file.parquet
/datafusion-skills:query SELECT 42
/datafusion-skills:datafusion-docs window functions

Prerequisites: datafusion-cli must be installed. If it isn't, the skills will offer to install it via /datafusion-skills:install-datafusion.

Platform support

These skills have been tested on macOS and Linux. Windows is not yet fully supported.

Reporting issues

Found a bug or have an idea? Open an issue at:

https://github.com/datafusion-contrib/datafusion-skills/issues

For DataFusion-specific bugs, please include the datafusion-cli version (datafusion-cli --version) and the full error message.

License

Apache License 2.0. See LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages