Skip to content

Conversation

@aaron-ang
Copy link

@aaron-ang aaron-ang commented Dec 17, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Implemented auto-generated metrics documentation.

Macros

On a metric struct: #[metric_doc] and #[metric_doc(common)] (marks common/shared metrics).

On an exec struct: #[metrics_doc(MetricA, MetricB)] lists the metrics types it exposes.

The struct fields and docs are collected in metric_docs.rs and generated in print_metric_docs.rs.

Are these changes tested?

Yes, by running dev/update_metrics_docs.sh.

Are there any user-facing changes?

@github-actions github-actions bot added documentation Improvements or additions to documentation development-process Related to development process of DataFusion logical-expr Logical plan and expressions core Core DataFusion crate execution Related to the execution crate physical-plan Changes to the physical-plan crate labels Dec 17, 2025
@aaron-ang aaron-ang marked this pull request as ready for review December 17, 2025 01:47
@aaron-ang aaron-ang force-pushed the doc/gen-metric branch 3 times, most recently from 92919a7 to 7eb039b Compare December 17, 2025 07:04
@aaron-ang
Copy link
Author

@martin-g thanks for the review. I agree with your feedback.

Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, looks good in general! (didn't check the macro part in details, but I guess it's good as long as it's working as expected), left some suggestions.

One additional issue is that some operator got metrics scattered in different sub-structs, and get mixed with non-metrics

peak_mem_used: metrics::Gauge,

I think we don't want to make the macro more complex to handle such cases, instead we can refactor them in the future, or just simply write this operator's metrics doc manually.

datafusion-execution = { workspace = true }
datafusion-expr = { workspace = true, default-features = false }
datafusion-expr-common = { workspace = true }
datafusion-doc = { workspace = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this new dependency and bin will be included in the release binary, can we guard them behind a feature?

- name: Check if metrics.md has been modified
run: |
# If you encounter an error, run './dev/update_metric_docs.sh' and commit
./dev/update_metric_docs.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also include this step to ./dev/rust_lint.rs for local non-functional check? Ideally we can have a version that only do the checks, but won't update the files.

/// Helper attribute to register metrics structs or execs for documentation generation.
/// Usage:
/// - `#[metric_doc(common)]` on `*_Metrics` structs (derives `DocumentedMetrics`)
/// - `#[metric_doc(M1, M2)]` on `*_Exec` structs to link metrics (derives `DocumentedExec`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend to add more examples to explain how to use this macro.

Let's say we have an operator UserDefinedExec, and its inside has two struct for metrics MetricsGroup1 and MetricsGroup2

Is it the case we should add
#[metric_doc(MetricsGroup1, MetricsGroup2)] to UserDefinedExec, and add #[metric_doc] to both struct MetricsGroup1 and MetricsGroup2?

Would it still be working if MetricsGroup* lives in a different file than the UserDefinedExec?

Is it possible to also test for:

  • exec and metrics struct lives in different file
  • one exec has multiple metrics struct

@github-actions github-actions bot added the ffi Changes to the ffi crate label Dec 22, 2025
@aaron-ang aaron-ang requested a review from 2010YOUY01 December 22, 2025 22:12
Comment on lines +26 to 29
#[cfg(feature = "integration-tests")]
pub fn ctx_and_codec() -> (Arc<SessionContext>, FFI_LogicalExtensionCodec) {
let ctx = Arc::new(SessionContext::default());
let task_ctx_provider = Arc::clone(&ctx) as Arc<dyn TaskContextProvider>;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got a lint warning in rust-analyzer about this unused function, so I added the cfg flag

@github-actions github-actions bot added the datasource Changes to the datasource crate label Dec 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate datasource Changes to the datasource crate development-process Related to development process of DataFusion documentation Improvements or additions to documentation execution Related to the execution crate ffi Changes to the ffi crate logical-expr Logical plan and expressions physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a document page for all available metrics

3 participants