Add metric category filtering for EXPLAIN ANALYZE by adriangb · Pull Request #21160 · apache/datafusion

adriangb · 2026-03-25T22:14:11Z

Summary

Adds MetricCategory enum (Rows, Bytes, Timing) classifying metrics by what they measure and, critically, their determinism: rows/bytes are deterministic given the same plan+data; timing varies across runs.
Each Metric can now declare its category via MetricBuilder::with_category(). Well-known builder methods (output_rows, elapsed_compute, output_bytes, etc.) set the category automatically. Custom counters/gauges default to "always included".
New session config datafusion.explain.analyze_categories accepts all (default), none, or comma-separated rows, bytes, timing.
This is orthogonal to the existing analyze_level (summary/dev) which controls verbosity.

Motivation

Running EXPLAIN ANALYZE in .slt tests currently requires liberal use of <slt:ignore> for every non-deterministic timing metric. With this change, a test can simply:

SET datafusion.explain.analyze_categories = 'rows';
EXPLAIN ANALYZE SELECT ...;
-- output contains only row-count metrics — fully deterministic, no <slt:ignore> needed

In particular, for dynamic filters we have relatively complex integration tests that exist mostly to assert the plan shapes and state of the dynamic filters after the plan has been executed. For example #21059. With this change I think most of those can be moved to SLT tests. I've also wanted to e.g. make assertions about pruning effectiveness without having timing information included.

Test plan

New Rust integration test explain_analyze_categories covering all combos (rows, none, all, rows+bytes)
New .slt tests in explain_analyze.slt for rows, none, rows,bytes, and rows with dev level
Existing explain_analyze integration tests pass (24/24)
Proto roundtrip test updated and passing
information_schema slt updated for new config entry
Full core_integration suite passes (918 tests)

🤖 Generated with Claude Code

Introduces `MetricCategory` (Rows, Bytes, Timing) so that EXPLAIN ANALYZE output can be narrowed to only deterministic metrics, which is especially useful in sqllogictest (.slt) files where timing values would otherwise require `<slt:ignore>` markers everywhere. Each `Metric` now optionally declares a category via `MetricBuilder::with_category()`. Well-known builder methods (`output_rows`, `elapsed_compute`, …) set the category automatically; custom counters/gauges default to "always included". A new session config `datafusion.explain.analyze_categories` accepts `all` (default), `none`, or a comma-separated list of `rows`, `bytes`, `timing` to control which categories appear. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Use a parquet table with multiple row groups and a TopK ORDER BY LIMIT query that triggers DynamicFilter pushdown. This makes the slt examples much more realistic — they show pruning metrics, row group statistics, and the resolved DynamicFilter predicate. Add a 'timing' category example that shows only elapsed_compute and metadata_load_time (with <slt:ignore> since they are non-deterministic). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Opt in every custom counter/gauge created by DataFusion's own operators (parquet, file_stream, joins, aggregates, topk, unnest, buffer) so that category filtering works cleanly out of the box. For example `bytes_scanned` → Bytes, `pushdown_rows_pruned` → Rows, `peak_mem_used` → Bytes, `row_replacements` → Rows, etc. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2010YOUY01 · 2026-03-26T01:25:33Z

datafusion/common/src/format.rs

+    ///
+    /// **Non-deterministic** — varies across runs even on the same hardware.
+    Timing,
+}


Perhaps we can add another variant for all uncategorized metrics, so we can do

set datafusion.explain.analyze_category = 'rows, bytes, uncategorized' -- Only exclude `Timing` metrics category

github-actions bot added physical-expr Changes to the physical-expr crates core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) common Related to common crate proto Related to proto crate physical-plan Changes to the physical-plan crate labels Mar 25, 2026

adriangb and others added 2 commits March 25, 2026 17:17

github-actions bot added the datasource Changes to the datasource crate label Mar 25, 2026

fix

90963d4

2010YOUY01 reviewed Mar 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metric category filtering for EXPLAIN ANALYZE#21160

Add metric category filtering for EXPLAIN ANALYZE#21160
adriangb wants to merge 4 commits intoapache:mainfrom
pydantic:explain-analyze-metric-categories

adriangb commented Mar 25, 2026 •

edited

Loading

Uh oh!

2010YOUY01 Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

adriangb commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Test plan

Uh oh!

2010YOUY01 Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adriangb commented Mar 25, 2026 •

edited

Loading