Skip to content

feat: expose catalog metadata api for pypaimon_rust#306

Merged
JingsongLi merged 1 commit into
apache:mainfrom
SML0127:issue-284
May 24, 2026
Merged

feat: expose catalog metadata api for pypaimon_rust#306
JingsongLi merged 1 commit into
apache:mainfrom
SML0127:issue-284

Conversation

@SML0127
Copy link
Copy Markdown
Contributor

@SML0127 SML0127 commented May 3, 2026

Purpose

Linked issue: close #284

Brief change log

  • added list_databases, list_tables, and get_table methods to existing PaimonCatalog.
  • introduced new PyO3 wrapper classes for metadata inspection: Table, TableSchema, and DataField.

Tests

API and Format

  • Python-only API additions. No changes to Rust public APIs or storage formats.

Documentation

Copy link
Copy Markdown

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. The runtime binding implementation looks generally good, and the existing CI is green. I also verified locally that cargo test -q -p pypaimon_rust --no-default-features passes after the first build.

One blocking issue: this package ships py.typed and bindings/python/python/pypaimon_rust/datafusion.pyi, but the stub file still only exposes PaimonCatalog.__init__ and __datafusion_catalog_provider__. The new public APIs added by this PR (list_databases, list_tables, get_table, plus the new Table, TableSchema, and DataField classes and their methods) are missing from the type stubs.

That means dynamic Python calls work, but type-aware users and IDEs will still see the old API. Please update bindings/python/python/pypaimon_rust/datafusion.pyi to match the new exported Python surface. It would also be helpful to cover this with a small stub/import check if the project already has a place for that.

Also, GitHub currently reports the PR as not mergeable (mergeable_state=dirty), so it will need a rebase/update before merge.

@SML0127
Copy link
Copy Markdown
Contributor Author

SML0127 commented May 22, 2026

@leaves12138 Thanks for the comment. I've updated and rebbased the branch.

Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! The catalog metadata API additions look good overall.

Review comments:

  1. Type stubs updated — I see this PR now includes updates to datafusion.pyi with DataField, TableSchema, Table, and the new PaimonCatalog methods. This addresses the concern from the previous review. Good.

  2. get_table identifier parsing: The splitn(2, '.') approach works for simple cases but will fail for database names containing dots (e.g., my.db.my_table). While this is unlikely in practice, consider documenting that database names must not contain dots, or using the first dot as the separator (which is what you're already doing with splitn(2, '.')). The current implementation is fine.

  3. Missing observability APIs in this PR's Table class (vs PR #307): This PR exposes identifier(), location(), and schema() only. The observability methods (latest_snapshot, list_snapshots, etc.) are in the follow-up #307. This is a clean separation — metadata in #306, observability in #307.

  4. Test coverage: test_list_databases_and_tables covers the happy path well. Consider adding a test for get_table with an invalid identifier format (e.g., missing dot) to verify the PyValueError is raised correctly.

  5. Mergeable state: As the previous reviewer noted, this PR has merge conflicts (mergeable_state=dirty). Please rebase onto main before this can be merged.

  6. Architecture note: Storing both catalog: Arc<dyn Catalog> and provider: Arc<PaimonCatalogProvider> in PaimonCatalog is slightly redundant since PaimonCatalogProvider internally also holds the catalog. But it's a clean approach that avoids reaching through the provider, so it's fine.

Please rebase and this should be ready to merge.

Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Copy Markdown

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. The previous blocking concern about the Python type stubs has been addressed, and the PR now exposes the new catalog metadata APIs consistently in both the runtime binding and datafusion.pyi. I also checked the PR merge ref locally with cargo fmt --all --check, cargo check -p pypaimon_rust --no-default-features, and cargo test -p pypaimon_rust --no-default-features; all passed. GitHub CI is green as well. +1.

@JingsongLi JingsongLi merged commit ad40acc into apache:main May 24, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose catalog metadata API (list_databases, list_tables, schema) for pypaimon_rust

3 participants