Skip to content

Conversation

@kevinjqliu
Copy link
Contributor

@kevinjqliu kevinjqliu commented Sep 26, 2025

Rationale for this change

Add 2 new Make command (make notebook) to spin up a jupyter notebook; make notebook-infra spins up jupyter notebook along with integration test infrastructure.

Pyiceberg Example Notebook

Pyiceberg example notebook (notebooks/pyiceberg_example.ipynb) is based on the https://py.iceberg.apache.org/#getting-started-with-pyiceberg page and doesn't require additional test infra.

Spark Example Notebook

Spark integration example notebook (notebooks/spark_integration_example.ipynb) is based on https://iceberg.apache.org/docs/nightly/spark-getting-started/ and requires integration test infrastructure (Spark, IRC, S3)

With spark connect (#2491) and our testing setup, we can quickly spin up a local env with make test-integration-exec which includes:

  • spark
  • iceberg rest catalog
  • hive metastore
  • minio

In the jupyter notebook, connect to spark easily

from pyspark.sql import SparkSession

# Create SparkSession against the remote Spark Connect server
spark = SparkSession.builder.remote("sc://localhost:15002").getOrCreate()
spark.sql("SHOW CATALOGS").show()

Are these changes tested?

Yes, run both make notebook and make notebook-infra locally and run the example notebooks

Are there any user-facing changes?

@kevinjqliu kevinjqliu requested a review from Fokko September 26, 2025 02:43
Makefile Outdated
@echo "Cleanup complete."

notebook: ## Launch Jupyter Notebook
${POETRY} run pip install jupyter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this into a poetry dependency group? Similar to the docs.

@Fokko
Copy link
Contributor

Fokko commented Sep 26, 2025

With spark connect (#2491) and our testing setup, we can quickly spin up a local env with

I agree, and that's great, but should we also spin up the resources as part of this effort? We could even inject a notebook that imports Spark-connect, etc (which won't be installed from a fresh install? I think this is a dev dependency, we probably want to double check there to avoid scaring newcomers to the project).

@jayceslesar
Copy link
Contributor

Bonus idea: what if make notebook or some other CLI entry point spun up pyspark + catalog configured via pyiceberg.yaml so users could immediately start querying their data?

@kevinjqliu
Copy link
Contributor Author

kevinjqliu commented Sep 26, 2025

We could even inject a notebook that imports Spark-connect

We could do getting started as a notebook! https://py.iceberg.apache.org/#getting-started-with-pyiceberg

@kevinjqliu
Copy link
Contributor Author

kevinjqliu commented Sep 26, 2025

Bonus idea: what if make notebook or some other CLI entry point spun up pyspark + catalog configured via pyiceberg.yaml so users could immediately start querying their data?

yea we could do that. the integration test setup gives us 2 different catalogs (rest and hms)

@Fokko
Copy link
Contributor

Fokko commented Sep 30, 2025

@kevinjqliu I would keep it simple, and go with the preferred catalog; REST :)

@kevinjqliu kevinjqliu force-pushed the kevinjqliu/make-notebook branch from 2e6d5a1 to d69f359 Compare December 29, 2025 23:36
test-integration: test-integration-setup test-integration-exec test-integration-cleanup ## Run integration tests

test-integration-setup: ## Start Docker services for integration tests
test-integration-setup: install ## Start Docker services for integration tests
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding the make install pre-req here because otherwise dev/provision.py will fail

@kevinjqliu kevinjqliu requested a review from Fokko December 29, 2025 23:37
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Render:
Screenshot 2025-12-29 at 3 48 27 PM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Render:
Screenshot 2025-12-29 at 3 49 10 PM

@@ -0,0 +1,359 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also add a .pre-commit linter for notebooks: https://github.com/nbQA-dev/nbQA

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Should we also add a section to the docs?

.gitignore
uv.lock
mkdocs/*
notebooks/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do this, but then we have to make sure that they are not bundled in the release. The notebooks do contain code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea agreed. i double check the artifacts, the new notebooks/ dir is not included. Similar to how the mkdocs/ dir is not included.

Feels like this can be a potential footgun where a folder is included in the artifact but RAT check is ignored in .rat-excludes. I think we can add a CI check to prevent this. I'll track this as a separate issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find /tmp/warehouse/
```

## Try it yourself with Jupyter Notebooks
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to the frontpage, https://py.iceberg.apache.org/
Screenshot 2026-01-06 at 12 19 11 PM

export PYICEBERG_CATALOG__TEST_CATALOG__SECRET_ACCESS_KEY=password
```

## Notebooks for Experimentation
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinjqliu
Copy link
Contributor Author

Added some docs and linter for notebook

Copy link
Contributor

@geruh geruh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This is awesome Kevin! Was able to get it up and running locally with no issues

@kevinjqliu kevinjqliu requested a review from Fokko January 7, 2026 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants