Data Platform Template

A reusable data platform built on Dagster + Neo4j + Polars + marimo, with a comprehensive MCP server ecosystem for agent-assisted development. Ships with an example domain that demonstrates every architectural pattern.

What This Is

A production-quality scaffold for building analytics platforms where data naturally forms a graph. The template provides:

Dagster orchestration with medallion-layer asset organization (raw/stg/enr/dim/fct)
Neo4j graph persistence via a custom IO Manager that bridges Polars DataFrames to graph nodes
Polars for all in-memory computation (no pandas)
marimo reactive notebooks for analysis and visualization
MCP servers (neo4j, dagster, serena, context7, memory, marimo) enabling agent-assisted development where the agent can query the graph, trigger pipelines, inspect results, and navigate code — all within a single conversation

An example domain is included to demonstrate every pattern. See EXAMPLE.md for its specification.

Quick Start

Prerequisites: Python 3.12+, uv, Docker, just (recommended)

just setup    # Install deps, start Neo4j, apply schema
just dagster  # Start pipeline UI at localhost:3000 (separate terminal)
just notebook # Start notebook editor at localhost:2718 (separate terminal)

Or without just:

./dev.sh      # Start Neo4j, sync deps, apply schema, launch Dagster

Open the Dagster UI at localhost:3000 and click Materialize All, or materialize assets individually in dependency order.

just check    # Full verification: mypy + ruff + pytest

See CONFIG.md for manual setup, MCP server configuration, and troubleshooting.

Architecture at a Glance

External Data Source
       │
  :Raw ──▶ :Stg ──▶ :Enr ──▶ :Fct
                                │
  :Dim (calendar, entities, groupings)

Each layer is a Neo4j node label. Nodes carry both a layer label and a domain label (:Enr:DataPoint, :Fct:Alert), enabling queries like MATCH (r:Enr:DataPoint) or MATCH (r:DataPoint) across all layers.

Three-layer code architecture with strict dependency direction:

Layer A (Data) — API wrapper and raw ingestion. The only layer with external I/O.
Layer B (Computation) — Pure functions for domain metrics. No imports from A or C. Testable with synthetic data alone.
Layer C (Persistence) — A custom Dagster IO Manager bridges Polars DataFrames to Neo4j. Assets return DataFrames and declare Cypher templates in metadata.

See ARCHITECTURE.md for the full technical design.

What Gets Built

Dagster assets across 5 materialization layers:

Data pipeline: raw_* → stg_* → enr_* — raw ingestion, validation, enrichment with computed metrics
Dimensions: dim_calendar, dim_* — temporal backbone, entity dimensions, grouping dimensions, temporal events
Fact tables: fct_* — classified outputs derived from enriched data

Plus purge_graph (ops group) for memory-safe graph reset via APOC batches.

Component	Tool
Language	Python 3.12+
Package management	uv
Task runner	just
Data processing	Polars
Orchestration	Dagster
Persistence	Neo4j
Notebooks	marimo
Quality	mypy, ruff, pytest + hypothesis

MCP Server Ecosystem

The platform includes six MCP servers that enable deep agent-component interaction:

Server	What It Enables
neo4j	Live graph exploration, Cypher query prototyping, data verification — the graph becomes a reasoning surface
dagster	Trigger materializations, inspect run status/logs/failures as structured data
serena	Semantic code navigation, symbol search, find all references
context7	Up-to-date library documentation for Dagster/Polars/Neo4j/marimo APIs
memory	Persistent knowledge graph across sessions
marimo	Interact with running notebook sessions

See MCP_GUIDE.md for detailed capabilities and configuration.

Starting a New Project

Quick Path

just init my_project_name

This strips the example domain and scaffolds a blank project with the correct patterns in place.

Manual Path

Replace dataplatform/domain.py with your domain constants and types
Replace dataplatform/metrics/ with your domain metrics (pure functions)
Replace dataplatform/resources/ with your data source wrapper
Update assets in dataplatform/assets/ with your Cypher templates
Update dataplatform/graph/schema.py with your constraints
Update tests to match your domain
Run just check to verify everything passes

Recommended build order: dimensions first (dim/), then raw ingestion (raw/), then validation (stg/), then enrichment (enr/), then fact tables (fct/).

See DATA_MODELLING.md for guidance on graph data modelling.

Research Notebooks

Interactive notebooks built with marimo for exploring the graph and analyzing data.

just notebook                  # Open notebook editor at localhost:2718
just notebook-run file.py      # Run a single notebook as an app

See MARIMO_GUIDE.md for notebook capabilities and usage.

Documentation

Document	What It Covers
EXAMPLE.md	Included example domain — remove when starting your own project
ARCHITECTURE.md	Technical design — layers, IO manager, constraints, idempotency
CONFIG.md	Setup and configuration — prerequisites, manual setup, MCP servers, troubleshooting
DATA_MODELLING.md	Graph data modelling guide — medallion layers, Cypher templates, calendar integration
MCP_GUIDE.md	MCP ecosystem — server capabilities, development workflows, configuration
MARIMO_GUIDE.md	Notebook guide — marimo capabilities, Neo4j integration, visualization
CLAUDE.md	Agent guidelines — development workflow, verification, coding standards

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.serena		.serena
data/ingest		data/ingest
dataplatform		dataplatform
library		library
neo4j/plugins		neo4j/plugins
notebooks		notebooks
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
Justfile		Justfile
README.md		README.md
dagster_mcp.py		dagster_mcp.py
dev.sh		dev.sh
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Platform Template

What This Is

Quick Start

Architecture at a Glance

What Gets Built

MCP Server Ecosystem

Starting a New Project

Quick Path

Manual Path

Research Notebooks

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Platform Template

What This Is

Quick Start

Architecture at a Glance

What Gets Built

MCP Server Ecosystem

Starting a New Project

Quick Path

Manual Path

Research Notebooks

Documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages