-
Notifications
You must be signed in to change notification settings - Fork 1.8k
docs: auto generate metrics documentation #19367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
7960c90 to
be66f33
Compare
92919a7 to
7eb039b
Compare
|
@martin-g thanks for the review. I agree with your feedback. |
3d2a54e to
189904e
Compare
2010YOUY01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, looks good in general! (didn't check the macro part in details, but I guess it's good as long as it's working as expected), left some suggestions.
One additional issue is that some operator got metrics scattered in different sub-structs, and get mixed with non-metrics
| peak_mem_used: metrics::Gauge, |
I think we don't want to make the macro more complex to handle such cases, instead we can refactor them in the future, or just simply write this operator's metrics doc manually.
datafusion/core/Cargo.toml
Outdated
| datafusion-execution = { workspace = true } | ||
| datafusion-expr = { workspace = true, default-features = false } | ||
| datafusion-expr-common = { workspace = true } | ||
| datafusion-doc = { workspace = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this new dependency and bin will be included in the release binary, can we guard them behind a feature?
| - name: Check if metrics.md has been modified | ||
| run: | | ||
| # If you encounter an error, run './dev/update_metric_docs.sh' and commit | ||
| ./dev/update_metric_docs.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also include this step to ./dev/rust_lint.rs for local non-functional check? Ideally we can have a version that only do the checks, but won't update the files.
datafusion/macros/src/lib.rs
Outdated
| /// Helper attribute to register metrics structs or execs for documentation generation. | ||
| /// Usage: | ||
| /// - `#[metric_doc(common)]` on `*_Metrics` structs (derives `DocumentedMetrics`) | ||
| /// - `#[metric_doc(M1, M2)]` on `*_Exec` structs to link metrics (derives `DocumentedExec`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend to add more examples to explain how to use this macro.
Let's say we have an operator UserDefinedExec, and its inside has two struct for metrics MetricsGroup1 and MetricsGroup2
Is it the case we should add
#[metric_doc(MetricsGroup1, MetricsGroup2)] to UserDefinedExec, and add #[metric_doc] to both struct MetricsGroup1 and MetricsGroup2?
Would it still be working if MetricsGroup* lives in a different file than the UserDefinedExec?
Is it possible to also test for:
- exec and metrics struct lives in different file
- one exec has multiple metrics struct
189904e to
31e2b0c
Compare
| #[cfg(feature = "integration-tests")] | ||
| pub fn ctx_and_codec() -> (Arc<SessionContext>, FFI_LogicalExtensionCodec) { | ||
| let ctx = Arc::new(SessionContext::default()); | ||
| let task_ctx_provider = Arc::clone(&ctx) as Arc<dyn TaskContextProvider>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got a lint warning in rust-analyzer about this unused function, so I added the cfg flag
Which issue does this PR close?
HashJoinExecin the metrics user-guide #19044, IncludeNestedLoopJoinExecin the metrics user-guide #19045.Rationale for this change
What changes are included in this PR?
Implemented auto-generated metrics documentation.
Macros
On a metric struct:
#[metric_doc]and#[metric_doc(common)](marks common/shared metrics).On an exec struct:
#[metrics_doc(MetricA, MetricB)]lists the metrics types it exposes.The struct fields and docs are collected in
metric_docs.rsand generated inprint_metric_docs.rs.Are these changes tested?
Yes, by running
dev/update_metrics_docs.sh.Are there any user-facing changes?