feat: expose catalog metadata api for pypaimon_rust#306
Conversation
leaves12138
left a comment
There was a problem hiding this comment.
Thanks for the PR. The runtime binding implementation looks generally good, and the existing CI is green. I also verified locally that cargo test -q -p pypaimon_rust --no-default-features passes after the first build.
One blocking issue: this package ships py.typed and bindings/python/python/pypaimon_rust/datafusion.pyi, but the stub file still only exposes PaimonCatalog.__init__ and __datafusion_catalog_provider__. The new public APIs added by this PR (list_databases, list_tables, get_table, plus the new Table, TableSchema, and DataField classes and their methods) are missing from the type stubs.
That means dynamic Python calls work, but type-aware users and IDEs will still see the old API. Please update bindings/python/python/pypaimon_rust/datafusion.pyi to match the new exported Python surface. It would also be helpful to cover this with a small stub/import check if the project already has a place for that.
Also, GitHub currently reports the PR as not mergeable (mergeable_state=dirty), so it will need a rebase/update before merge.
|
@leaves12138 Thanks for the comment. I've updated and rebbased the branch. |
JingsongLi
left a comment
There was a problem hiding this comment.
Thanks for the contribution! The catalog metadata API additions look good overall.
Review comments:
-
Type stubs updated — I see this PR now includes updates to
datafusion.pyiwithDataField,TableSchema,Table, and the newPaimonCatalogmethods. This addresses the concern from the previous review. Good. -
get_tableidentifier parsing: Thesplitn(2, '.')approach works for simple cases but will fail for database names containing dots (e.g.,my.db.my_table). While this is unlikely in practice, consider documenting that database names must not contain dots, or using the first dot as the separator (which is what you're already doing withsplitn(2, '.')). The current implementation is fine. -
Missing observability APIs in this PR's
Tableclass (vs PR #307): This PR exposesidentifier(),location(), andschema()only. The observability methods (latest_snapshot,list_snapshots, etc.) are in the follow-up #307. This is a clean separation — metadata in #306, observability in #307. -
Test coverage:
test_list_databases_and_tablescovers the happy path well. Consider adding a test forget_tablewith an invalid identifier format (e.g., missing dot) to verify thePyValueErroris raised correctly. -
Mergeable state: As the previous reviewer noted, this PR has merge conflicts (
mergeable_state=dirty). Please rebase onto main before this can be merged. -
Architecture note: Storing both
catalog: Arc<dyn Catalog>andprovider: Arc<PaimonCatalogProvider>inPaimonCatalogis slightly redundant sincePaimonCatalogProviderinternally also holds the catalog. But it's a clean approach that avoids reaching through the provider, so it's fine.
Please rebase and this should be ready to merge.
leaves12138
left a comment
There was a problem hiding this comment.
Thanks for the updates. The previous blocking concern about the Python type stubs has been addressed, and the PR now exposes the new catalog metadata APIs consistently in both the runtime binding and datafusion.pyi. I also checked the PR merge ref locally with cargo fmt --all --check, cargo check -p pypaimon_rust --no-default-features, and cargo test -p pypaimon_rust --no-default-features; all passed. GitHub CI is green as well. +1.
Purpose
Linked issue: close #284
Brief change log
Tests
API and Format
Documentation