A reusable data platform built on Dagster + Neo4j + Polars + marimo, with a comprehensive MCP server ecosystem for agent-assisted development. Ships with an example domain that demonstrates every architectural pattern.
A production-quality scaffold for building analytics platforms where data naturally forms a graph. The template provides:
- Dagster orchestration with medallion-layer asset organization (raw/stg/enr/dim/fct)
- Neo4j graph persistence via a custom IO Manager that bridges Polars DataFrames to graph nodes
- Polars for all in-memory computation (no pandas)
- marimo reactive notebooks for analysis and visualization
- MCP servers (neo4j, dagster, serena, context7, memory, marimo) enabling agent-assisted development where the agent can query the graph, trigger pipelines, inspect results, and navigate code — all within a single conversation
An example domain is included to demonstrate every pattern. See EXAMPLE.md for its specification.
Prerequisites: Python 3.12+, uv, Docker, just (recommended)
just setup # Install deps, start Neo4j, apply schema
just dagster # Start pipeline UI at localhost:3000 (separate terminal)
just notebook # Start notebook editor at localhost:2718 (separate terminal)Or without just:
./dev.sh # Start Neo4j, sync deps, apply schema, launch DagsterOpen the Dagster UI at localhost:3000 and click Materialize All, or materialize assets individually in dependency order.
just check # Full verification: mypy + ruff + pytestSee CONFIG.md for manual setup, MCP server configuration, and troubleshooting.
External Data Source
│
:Raw ──▶ :Stg ──▶ :Enr ──▶ :Fct
│
:Dim (calendar, entities, groupings)
Each layer is a Neo4j node label. Nodes carry both a layer label and a domain label (:Enr:DataPoint, :Fct:Alert), enabling queries like MATCH (r:Enr:DataPoint) or MATCH (r:DataPoint) across all layers.
Three-layer code architecture with strict dependency direction:
- Layer A (Data) — API wrapper and raw ingestion. The only layer with external I/O.
- Layer B (Computation) — Pure functions for domain metrics. No imports from A or C. Testable with synthetic data alone.
- Layer C (Persistence) — A custom Dagster IO Manager bridges Polars DataFrames to Neo4j. Assets return DataFrames and declare Cypher templates in metadata.
See ARCHITECTURE.md for the full technical design.
Dagster assets across 5 materialization layers:
- Data pipeline:
raw_*→stg_*→enr_*— raw ingestion, validation, enrichment with computed metrics - Dimensions:
dim_calendar,dim_*— temporal backbone, entity dimensions, grouping dimensions, temporal events - Fact tables:
fct_*— classified outputs derived from enriched data
Plus purge_graph (ops group) for memory-safe graph reset via APOC batches.
| Component | Tool |
|---|---|
| Language | Python 3.12+ |
| Package management | uv |
| Task runner | just |
| Data processing | Polars |
| Orchestration | Dagster |
| Persistence | Neo4j |
| Notebooks | marimo |
| Quality | mypy, ruff, pytest + hypothesis |
The platform includes six MCP servers that enable deep agent-component interaction:
| Server | What It Enables |
|---|---|
| neo4j | Live graph exploration, Cypher query prototyping, data verification — the graph becomes a reasoning surface |
| dagster | Trigger materializations, inspect run status/logs/failures as structured data |
| serena | Semantic code navigation, symbol search, find all references |
| context7 | Up-to-date library documentation for Dagster/Polars/Neo4j/marimo APIs |
| memory | Persistent knowledge graph across sessions |
| marimo | Interact with running notebook sessions |
See MCP_GUIDE.md for detailed capabilities and configuration.
just init my_project_nameThis strips the example domain and scaffolds a blank project with the correct patterns in place.
- Replace
dataplatform/domain.pywith your domain constants and types - Replace
dataplatform/metrics/with your domain metrics (pure functions) - Replace
dataplatform/resources/with your data source wrapper - Update assets in
dataplatform/assets/with your Cypher templates - Update
dataplatform/graph/schema.pywith your constraints - Update tests to match your domain
- Run
just checkto verify everything passes
Recommended build order: dimensions first (dim/), then raw ingestion (raw/), then validation (stg/), then enrichment (enr/), then fact tables (fct/).
See DATA_MODELLING.md for guidance on graph data modelling.
Interactive notebooks built with marimo for exploring the graph and analyzing data.
just notebook # Open notebook editor at localhost:2718
just notebook-run file.py # Run a single notebook as an appSee MARIMO_GUIDE.md for notebook capabilities and usage.
| Document | What It Covers |
|---|---|
| EXAMPLE.md | Included example domain — remove when starting your own project |
| ARCHITECTURE.md | Technical design — layers, IO manager, constraints, idempotency |
| CONFIG.md | Setup and configuration — prerequisites, manual setup, MCP servers, troubleshooting |
| DATA_MODELLING.md | Graph data modelling guide — medallion layers, Cypher templates, calendar integration |
| MCP_GUIDE.md | MCP ecosystem — server capabilities, development workflows, configuration |
| MARIMO_GUIDE.md | Notebook guide — marimo capabilities, Neo4j integration, visualization |
| CLAUDE.md | Agent guidelines — development workflow, verification, coding standards |