-
Notifications
You must be signed in to change notification settings - Fork 418
dev: add make notebook
#2528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
dev: add make notebook
#2528
Conversation
Makefile
Outdated
| @echo "Cleanup complete." | ||
|
|
||
| notebook: ## Launch Jupyter Notebook | ||
| ${POETRY} run pip install jupyter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move this into a poetry dependency group? Similar to the docs.
I agree, and that's great, but should we also spin up the resources as part of this effort? We could even inject a notebook that imports Spark-connect, etc (which won't be installed from a fresh install? I think this is a dev dependency, we probably want to double check there to avoid scaring newcomers to the project). |
|
Bonus idea: what if |
We could do getting started as a notebook! https://py.iceberg.apache.org/#getting-started-with-pyiceberg |
yea we could do that. the integration test setup gives us 2 different catalogs (rest and hms) |
|
@kevinjqliu I would keep it simple, and go with the preferred catalog; REST :) |
2e6d5a1 to
d69f359
Compare
| test-integration: test-integration-setup test-integration-exec test-integration-cleanup ## Run integration tests | ||
|
|
||
| test-integration-setup: ## Start Docker services for integration tests | ||
| test-integration-setup: install ## Start Docker services for integration tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding the make install pre-req here because otherwise dev/provision.py will fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @@ -0,0 +1,359 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also add a .pre-commit linter for notebooks: https://github.com/nbQA-dev/nbQA
Fokko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Should we also add a section to the docs?
| .gitignore | ||
| uv.lock | ||
| mkdocs/* | ||
| notebooks/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do this, but then we have to make sure that they are not bundled in the release. The notebooks do contain code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea agreed. i double check the artifacts, the new notebooks/ dir is not included. Similar to how the mkdocs/ dir is not included.
Feels like this can be a potential footgun where a folder is included in the artifact but RAT check is ignored in .rat-excludes. I think we can add a CI check to prevent this. I'll track this as a separate issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| find /tmp/warehouse/ | ||
| ``` | ||
|
|
||
| ## Try it yourself with Jupyter Notebooks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to the frontpage, https://py.iceberg.apache.org/

| export PYICEBERG_CATALOG__TEST_CATALOG__SECRET_ACCESS_KEY=password | ||
| ``` | ||
|
|
||
| ## Notebooks for Experimentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a new section to https://py.iceberg.apache.org/contributing/
|
Added some docs and linter for notebook |
geruh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! This is awesome Kevin! Was able to get it up and running locally with no issues



Rationale for this change
Add 2 new Make command (
make notebook) to spin up a jupyter notebook;make notebook-infraspins up jupyter notebook along with integration test infrastructure.Pyiceberg Example Notebook
Pyiceberg example notebook (
notebooks/pyiceberg_example.ipynb) is based on the https://py.iceberg.apache.org/#getting-started-with-pyiceberg page and doesn't require additional test infra.Spark Example Notebook
Spark integration example notebook (
notebooks/spark_integration_example.ipynb) is based on https://iceberg.apache.org/docs/nightly/spark-getting-started/ and requires integration test infrastructure (Spark, IRC, S3)With spark connect (#2491) and our testing setup, we can quickly spin up a local env with
make test-integration-execwhich includes:In the jupyter notebook, connect to spark easily
Are these changes tested?
Yes, run both
make notebookandmake notebook-infralocally and run the example notebooksAre there any user-facing changes?