Skip to content

Tracking issue: sample-based NDV for large ANALYZE jobs #67449

Description

@0xPoe

Summary

Track the implementation of sample-based NDV collection for large-table ANALYZE.

Goal

Reduce TiKV-side NDV collection cost for very large analyze jobs while keeping NDV accuracy acceptable.

Scope

  • implement sample-based NDV collection for Analyze V2
  • define how it is enabled (explicit knob and/or auto-trigger for very large / slow analyze jobs)
  • benchmark both NDV accuracy and resource usage on representative datasets
  • document behavior and limitations clearly

Out of scope

The first step does not promise full-scan / IOPS reduction. The initial implementation mainly targets TiKV CPU cost.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions