Skip to content

expose table observability API for pypaimon_rust#307

Open
SML0127 wants to merge 4 commits into
apache:mainfrom
SML0127:issue-285
Open

expose table observability API for pypaimon_rust#307
SML0127 wants to merge 4 commits into
apache:mainfrom
SML0127:issue-285

Conversation

@SML0127
Copy link
Copy Markdown
Contributor

@SML0127 SML0127 commented May 3, 2026

Purpose

Linked issue: close #285

Brief change log

  • added PartitionStat and Table::partition_stats / list_partitions.
  • exposed latest_snapshot, list_snapshots, list_partitions, partition_stats, and list_tags

Tests

API and Format

Documentation

@SML0127
Copy link
Copy Markdown
Contributor Author

SML0127 commented May 3, 2026

I'll rebase this pr once #306 is merged

Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! This extends the Python bindings with useful observability APIs (snapshots, tags, partitions).

A few review comments:

  1. PR #306 dependency: This PR builds on #306 (catalog metadata API). Given that #306 still has requested changes outstanding (type stubs and rebase), this PR should wait for #306 to merge first. Consider rebasing once #306 is resolved.

  2. partition_stat.rs — manifest entry scan approach: The read_all_manifest_entries implementation reads ALL manifests from both base_manifest_list and delta_manifest_list, then aggregates stats by partition key bytes. For tables with many manifests, this could be expensive. Consider:

    • Adding a note about expected performance characteristics.
    • Or in a follow-up, exploring whether the $partitions system table path in DataFusion could be reused instead of duplicating the scan logic.
  3. partition_value_to_string — the fallback branch _ => row.get_string(pos_i) will silently misinterpret binary types (VARBINARY, BINARY) as UTF-8 strings. Consider returning a hex encoding or explicitly erroring for unsupported partition key types.

  4. Snapshot ordering: list_snapshots() reverses the list (into_iter().rev()) so newest is first. The Python test asserts snaps[0].id() == snap.id() — which is correct. But this ordering convention should be documented in the Python type stubs or docstring.

  5. Missing type stubs: The new Snapshot, Tag, and PartitionStat classes need .pyi entries in datafusion.pyi for IDE support. Same issue as raised on #306.

  6. Thread safety of runtime().block_on(): The list_snapshots, latest_snapshot, list_tags, and partition_stats methods all call runtime().block_on(...) from Python. If called from within an async Python context (e.g., inside an asyncio event loop), this will panic. Consider documenting this limitation.

Overall the implementation looks correct and the test coverage is reasonable. The main blocker is the dependency on #306 and the missing type stubs.

SML0127 added a commit to SML0127/paimon-rust that referenced this pull request May 25, 2026
@SML0127 SML0127 marked this pull request as ready for review May 25, 2026 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose paimon table observability API (snapshots, partitions, tags) for pypaimon_rust

2 participants