expose table observability API for pypaimon_rust#307
Conversation
|
I'll rebase this pr once #306 is merged |
JingsongLi
left a comment
There was a problem hiding this comment.
Thanks for the contribution! This extends the Python bindings with useful observability APIs (snapshots, tags, partitions).
A few review comments:
-
PR #306 dependency: This PR builds on #306 (catalog metadata API). Given that #306 still has requested changes outstanding (type stubs and rebase), this PR should wait for #306 to merge first. Consider rebasing once #306 is resolved.
-
partition_stat.rs— manifest entry scan approach: Theread_all_manifest_entriesimplementation reads ALL manifests from bothbase_manifest_listanddelta_manifest_list, then aggregates stats by partition key bytes. For tables with many manifests, this could be expensive. Consider:- Adding a note about expected performance characteristics.
- Or in a follow-up, exploring whether the
$partitionssystem table path in DataFusion could be reused instead of duplicating the scan logic.
-
partition_value_to_string— the fallback branch_ => row.get_string(pos_i)will silently misinterpret binary types (VARBINARY, BINARY) as UTF-8 strings. Consider returning a hex encoding or explicitly erroring for unsupported partition key types. -
Snapshot ordering:
list_snapshots()reverses the list (into_iter().rev()) so newest is first. The Python test assertssnaps[0].id() == snap.id()— which is correct. But this ordering convention should be documented in the Python type stubs or docstring. -
Missing type stubs: The new
Snapshot,Tag, andPartitionStatclasses need.pyientries indatafusion.pyifor IDE support. Same issue as raised on #306. -
Thread safety of
runtime().block_on(): Thelist_snapshots,latest_snapshot,list_tags, andpartition_statsmethods all callruntime().block_on(...)from Python. If called from within an async Python context (e.g., inside anasyncioevent loop), this will panic. Consider documenting this limitation.
Overall the implementation looks correct and the test coverage is reasonable. The main blocker is the dependency on #306 and the missing type stubs.
Purpose
Linked issue: close #285
Brief change log
Tests
API and Format
Documentation