A Claude Code plugin that adds Apache DataFusion-powered skills for data exploration, querying, and materialized views.
Add the repository as a plugin source and install:
/plugin marketplace add datafusion-contrib/datafusion-skills
/plugin install datafusion-skills@datafusion-skills
This registers the GitHub repo as a marketplace and installs the plugin. Skills will be available as /datafusion-skills:<skill-name> in all future sessions.
/plugin marketplace update datafusion-skills
/plugin update datafusion-skills@datafusion-skills
Run SQL queries against registered tables or ad-hoc against files. Accepts raw SQL or natural language questions. Supports Parquet, CSV, JSON, Arrow IPC, and Avro.
/datafusion-skills:query SELECT * FROM 'trades.parquet' WHERE symbol = 'AAPL' LIMIT 10
/datafusion-skills:query "what are the top 5 symbols by volume?"
/datafusion-skills:query FROM sales WHERE amount > 100
Read and explore any data file — Parquet, CSV, JSON, Arrow IPC, Avro — locally or from S3/GCS. Auto-detects format by extension.
/datafusion-skills:read-file trades.parquet what columns does it have?
/datafusion-skills:read-file s3://my-bucket/data.parquet describe the schema
/datafusion-skills:read-file metrics.csv how many rows?
Register a data file as a persistent external table. Explores the schema and persists the registration so all other skills can access the table automatically.
/datafusion-skills:create-table trades.parquet
/datafusion-skills:create-table data.csv --name sales --format csv
Create and manage materialized views — persist SQL query results as Parquet files for fast repeated access. Track source dependencies and refresh when data changes.
/datafusion-skills:materialized-view "create a daily summary of trades grouped by symbol"
/datafusion-skills:materialized-view refresh trades_daily
/datafusion-skills:materialized-view status
/datafusion-skills:materialized-view list
Visualize and analyze query execution plans. Identifies performance bottlenecks and suggests optimizations.
/datafusion-skills:explain-plan SELECT * FROM trades WHERE date > '2024-01-01'
/datafusion-skills:explain-plan --analyze SELECT COUNT(*) FROM large_table GROUP BY category
Search Apache DataFusion documentation — user guide, SQL reference, and API docs. Returns relevant documentation for a question or keyword.
/datafusion-skills:datafusion-docs window functions
/datafusion-skills:datafusion-docs "how do I create an external table?"
/datafusion-skills:datafusion-docs APPROX_PERCENTILE_CONT
Install or update datafusion-cli. Supports Homebrew, cargo install, and pre-built binaries.
/datafusion-skills:install-datafusion
/datafusion-skills:install-datafusion --update
All skills share a single state.sql file per project — a plain SQL file containing CREATE EXTERNAL TABLE statements and configuration. When state is first needed, you'll be asked where to store it:
- In the project directory (
.datafusion-skills/state.sql) — colocated with the project, optionally gitignored - In your home directory (
~/.datafusion-skills/<project>/state.sql) — keeps the repo clean
Any skill restores the session via datafusion-cli --file state.sql.
Skills reference each other where it makes sense:
read-filesuggestsqueryfor follow-up exploration andcreate-tablefor persisting dataqueryuses session state fromcreate-tableautomaticallymaterialized-viewcreates persistent Parquet files registered viacreate-tableexplain-planhelps optimize queries fromquery- All skills use
datafusion-docsto troubleshoot DataFusion errors automatically
Apache DataFusion is a fast, extensible query engine built in Rust on top of Apache Arrow. It offers:
- High performance: Vectorized execution, predicate pushdown, partition pruning
- Standard SQL: Full SQL support including window functions, CTEs, subqueries
- Extensibility: Custom table providers, UDFs, optimizer rules
- File format support: Parquet, CSV, JSON, Arrow IPC, Avro
- Cloud native: S3, GCS, Azure object store support
- Materialized views: Persist query results and track dependencies (unique to DataFusion ecosystem)
# Clone the repo
git clone https://github.com/datafusion-contrib/datafusion-skills.git
cd datafusion-skills
# Launch Claude Code with the local plugin directory
claude --plugin-dir .Test individual skills:
/datafusion-skills:read-file some_local_file.parquet
/datafusion-skills:query SELECT 42
/datafusion-skills:datafusion-docs window functions
Prerequisites: datafusion-cli must be installed. If it isn't, the skills will offer to install it via /datafusion-skills:install-datafusion.
These skills have been tested on macOS and Linux. Windows is not yet fully supported.
Found a bug or have an idea? Open an issue at:
https://github.com/datafusion-contrib/datafusion-skills/issues
For DataFusion-specific bugs, please include the datafusion-cli version (datafusion-cli --version) and the full error message.
Apache License 2.0. See LICENSE for details.