datafusion-skills

A Claude Code plugin that adds Apache DataFusion-powered skills for data exploration, querying, and materialized views.

Installation

From GitHub

Add the repository as a plugin source and install:

/plugin marketplace add datafusion-contrib/datafusion-skills
/plugin install datafusion-skills@datafusion-skills

This registers the GitHub repo as a marketplace and installs the plugin. Skills will be available as /datafusion-skills:<skill-name> in all future sessions.

Updating

/plugin marketplace update datafusion-skills
/plugin update datafusion-skills@datafusion-skills

Skills

`query`

Run SQL queries against registered tables or ad-hoc against files. Accepts raw SQL or natural language questions. Supports Parquet, CSV, JSON, Arrow IPC, and Avro.

/datafusion-skills:query SELECT * FROM 'trades.parquet' WHERE symbol = 'AAPL' LIMIT 10
/datafusion-skills:query "what are the top 5 symbols by volume?"
/datafusion-skills:query FROM sales WHERE amount > 100

`read-file`

Read and explore any data file — Parquet, CSV, JSON, Arrow IPC, Avro — locally or from S3/GCS. Auto-detects format by extension.

/datafusion-skills:read-file trades.parquet what columns does it have?
/datafusion-skills:read-file s3://my-bucket/data.parquet describe the schema
/datafusion-skills:read-file metrics.csv how many rows?

`create-table`

Register a data file as a persistent external table. Explores the schema and persists the registration so all other skills can access the table automatically.

/datafusion-skills:create-table trades.parquet
/datafusion-skills:create-table data.csv --name sales --format csv

`materialized-view`

Create and manage materialized views — persist SQL query results as Parquet files for fast repeated access. Track source dependencies and refresh when data changes.

/datafusion-skills:materialized-view "create a daily summary of trades grouped by symbol"
/datafusion-skills:materialized-view refresh trades_daily
/datafusion-skills:materialized-view status
/datafusion-skills:materialized-view list

`explain-plan`

Visualize and analyze query execution plans. Identifies performance bottlenecks and suggests optimizations.

/datafusion-skills:explain-plan SELECT * FROM trades WHERE date > '2024-01-01'
/datafusion-skills:explain-plan --analyze SELECT COUNT(*) FROM large_table GROUP BY category

`datafusion-docs`

Search Apache DataFusion documentation — user guide, SQL reference, and API docs. Returns relevant documentation for a question or keyword.

/datafusion-skills:datafusion-docs window functions
/datafusion-skills:datafusion-docs "how do I create an external table?"
/datafusion-skills:datafusion-docs APPROX_PERCENTILE_CONT

`install-datafusion`

Install or update datafusion-cli. Supports Homebrew, cargo install, and pre-built binaries.

/datafusion-skills:install-datafusion
/datafusion-skills:install-datafusion --update

Session state

All skills share a single state.sql file per project — a plain SQL file containing CREATE EXTERNAL TABLE statements and configuration. When state is first needed, you'll be asked where to store it:

In the project directory (.datafusion-skills/state.sql) — colocated with the project, optionally gitignored
In your home directory (~/.datafusion-skills/<project>/state.sql) — keeps the repo clean

Any skill restores the session via datafusion-cli --file state.sql.

How the skills work together

Skills reference each other where it makes sense:

read-file suggests query for follow-up exploration and create-table for persisting data
query uses session state from create-table automatically
materialized-view creates persistent Parquet files registered via create-table
explain-plan helps optimize queries from query
All skills use datafusion-docs to troubleshoot DataFusion errors automatically

Why DataFusion?

Apache DataFusion is a fast, extensible query engine built in Rust on top of Apache Arrow. It offers:

High performance: Vectorized execution, predicate pushdown, partition pruning
Standard SQL: Full SQL support including window functions, CTEs, subqueries
Extensibility: Custom table providers, UDFs, optimizer rules
File format support: Parquet, CSV, JSON, Arrow IPC, Avro
Cloud native: S3, GCS, Azure object store support
Materialized views: Persist query results and track dependencies (unique to DataFusion ecosystem)

Local development

# Clone the repo
git clone https://github.com/datafusion-contrib/datafusion-skills.git
cd datafusion-skills

# Launch Claude Code with the local plugin directory
claude --plugin-dir .

Test individual skills:

/datafusion-skills:read-file some_local_file.parquet
/datafusion-skills:query SELECT 42
/datafusion-skills:datafusion-docs window functions

Prerequisites: datafusion-cli must be installed. If it isn't, the skills will offer to install it via /datafusion-skills:install-datafusion.

Platform support

These skills have been tested on macOS and Linux. Windows is not yet fully supported.

Reporting issues

Found a bug or have an idea? Open an issue at:

https://github.com/datafusion-contrib/datafusion-skills/issues

For DataFusion-specific bugs, please include the datafusion-cli version (datafusion-cli --version) and the full error message.

License

Apache License 2.0. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.claude-plugin		.claude-plugin
skills		skills
test-data		test-data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

datafusion-skills

Installation

From GitHub

Updating

Skills

`query`

`read-file`

`create-table`

`materialized-view`

`explain-plan`

`datafusion-docs`

`install-datafusion`

Session state

How the skills work together

Why DataFusion?

Local development

Platform support

Reporting issues

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Folders and files

Latest commit

History

Repository files navigation

datafusion-skills

Installation

From GitHub

Updating

Skills

query

read-file

create-table

materialized-view

explain-plan

datafusion-docs

install-datafusion

Session state

How the skills work together

Why DataFusion?

Local development

Platform support

Reporting issues

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

`query`

`read-file`

`create-table`

`materialized-view`

`explain-plan`

`datafusion-docs`

`install-datafusion`

Packages