Skip to content

Latest commit

Β 

History

History
99 lines (74 loc) Β· 3.88 KB

File metadata and controls

99 lines (74 loc) Β· 3.88 KB

Contributing to Raincloud

Thanks for your interest in Raincloud. This guide covers how to set up a dev environment, run the test suite, and submit changes. For deeper dives into the pipeline itself, see README.md, AGENTS.md, and SKILLS.md.

Setting up

git clone git@github.com:spiraldb/raincloud.git
cd raincloud
uv sync --extra dev

--extra dev pulls in pytest. Add --extra kaggle or --extra huggingface if your work touches those upstream types, or --extra all for everything.

Before you open a PR

Three sub-second checks are the minimum gate (CI runs all three):

ruff check                                     # lint (pyflakes + pycodestyle + isort)
python -m scripts.pipeline.validate_manifest   # JSON Schema + cross-checks on sources.json
pytest                                         # smoke regression net (manifest, schema, registry, examples)

If you touched the build pipeline, also run a small end-to-end build to make sure it still produces the expected output:

python -m scripts.pipeline.build countries-of-the-world   # ~200 ms, 227 rows

For larger builds, see SKILLS.md.

What to send a PR for

  • New datasets β€” see SKILLS.md. Most entries copy examples/minimal_spec.json and pick an existing handler from docs/v1/handlers.md.
  • New transform handlers β€” see SKILLS.md. One handler per upstream shape; register in scripts/pipeline/handlers/__init__.py.
  • Bug fixes β€” start with a failing test where practical.
  • Documentation β€” README/AGENTS/SKILLS edits welcome. The two derived docs (docs/datasets.md, docs/handlers.md) are machine-generated; don't hand-edit them β€” fix the manifest or the registry and regenerate via python -m scripts.pipeline.docs.

Tests for new functionality

Add a test alongside any new behaviour:

  • New transform handler β€” a fixture-based test demonstrating the expected output shape (small in-memory pa.Table; see existing handler tests in tests/test_manifest.py for the pattern).
  • New manifest field or schema rule β€” extend test_manifest.py to assert it validates as expected.
  • New CLI flag β€” extend the relevant test_*.py (e.g. test_list_datasets.py for catalog-filter flags).
  • Bug fix β€” a failing test that the fix turns green.

pytest is the minimum pre-PR gate (see Before you open a PR); CI re-runs it on every PR via .github/workflows/ci.yml.

Branching and commits

  • Branch off develop. Branch names follow <initials>/<topic> (e.g. mp/add-fastlanes).
  • Open PRs against develop.
  • Commit messages: short imperative subject ("add X", "fix Y", "swap Z to W"), optional body explaining why the change is needed.

Reporting bugs

Open an issue on GitHub Issues. Include the slug you were building, the command you ran, and any traceback.

For security-related issues, do not open a public issue β€” see SECURITY.md for the private channel.

Coding style

  • Python β‰₯ 3.11. Match the style of nearby code; the repo prefers terse, comment-light Python with explicit names over abstractions.
  • No backwards-compat stubs or shims when removing handlers/slugs β€” git history is the fallback.
  • Always go through scripts.pipeline.spec.duckdb_connect for DuckDB connections so resource limits and storage_compatibility_version=v1.5.0 apply (see AGENTS.md).

License

By submitting a PR, you agree that your contribution will be licensed under the Apache License 2.0, the same license that covers the rest of the project.