tidylake

tidylake is an agnostic framework for managing data operations in your lakehouse using your favorite tools.

This project is currently under active development, it is currently production tested but future releases will likely include breaking changes.

Purpose

tidylake gives data teams a common ground between:

Transformation code
Metadata and contracts
Operational workflows

It helps you manage the data product lifecycle without locking your project to a single engine, notebook style, or orchestrator.

Why Use It

The key advantages are:

Framework agnostic by design: keep using pandas, Spark, Iceberg, and your own stack.
Metadata-first workflow: manifests act as the single source of truth for schema and semantics.
Better collaboration across personas: analysts and engineers can work on the same assets with less friction.
One codebase for interactive and batch work: iterate safely in notebooks and run the same logic in production.
Built-in structure for automation: lineage discovery, CLI execution, and plugin-based extension points.

Documentation

For full setup, concepts, and end-to-end examples, go to the documentation:

Read the docs

Minimal Example

The docs include complete runnable examples. This is a minimal sketch of what a tidylake data product looks like.

Create a manifest (silver_customers.yml):

data_product:
  name: silver_customers
  description: Customer profile from CRM
  script: silver_customers
  schema:
    properties:
      customer_id:
        type: string
      customer_name:
        type: string

Link it to a script (silver_customers.py):

import pandas as pd
from tidylake import get_or_create_context

product = get_or_create_context().get_data_product("silver_customers")

@product.add_input()
def bronze_customers():
    return pd.read_parquet("/tmp/bronze_customers")

df = bronze_customers()[["customer_id", "customer_name"]]

@product.set_sink()
def write_silver_customers():
    df.to_parquet("/tmp/silver_customers", index=False)

Then use the CLI:

tidylake list
tidylake run

You can extend tidylake with plugins to integrate storage, compute, and catalog services from your existing stack.

Contributing

See CONTRIBUTING.md for development setup and contribution guidelines.

License

This project is open source, released under the Apache License, and brought to you by the Taidy team.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.devcontainer		.devcontainer
.github		.github
docs-site		docs-site
docs		docs
src/tidylake		src/tidylake
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
Taskfile.yml		Taskfile.yml
docker-compose.yml		docker-compose.yml
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tidylake

Purpose

Why Use It

Documentation

Minimal Example

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

tidylake

Purpose

Why Use It

Documentation

Minimal Example

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages