Skip to content

[Enhancement] Optimize correlated join row count estimation to avoid repeated Statistics.buildFrom#67773

Merged
kangkaisen merged 1 commit into
StarRocks:mainfrom
stephen-shelby:reduce_build
Jan 13, 2026
Merged

[Enhancement] Optimize correlated join row count estimation to avoid repeated Statistics.buildFrom#67773
kangkaisen merged 1 commit into
StarRocks:mainfrom
stephen-shelby:reduce_build

Conversation

@stephen-shelby

@stephen-shelby stephen-shelby commented Jan 12, 2026

Copy link
Copy Markdown
Contributor

Why I'm doing:

Correlated join equality predicates are common in multi-join queries, and the current statistics estimation path applies auxiliary predicates by repeatedly rebuilding Statistics objects (multiple Statistics.buildFrom(...).build() per join). This adds unnecessary CPU overhead and object allocations in a hot optimizer path.

What I'm doing:

  • Optimize correlated inner join statistics estimation by collapsing repeated auxiliary row-count updates into a single scaling using (coef^{k}), avoiding O(n²) Statistics.buildFrom(...) calls.
  • Remove queue rotation / repeated rebuilding in the “choose driving predicate” loop while keeping the estimation semantics unchanged (auxiliary predicates only dampen row count, column stats unchanged).

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
    • This pr needs auto generate documentation
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.1
    • 4.0
    • 3.5
    • 3.4

Note

Speeds up correlated inner-join cardinality estimation without changing semantics.

  • Replaces queue-based driving-predicate loop with single-pass selection; apply UNKNOWN_AUXILIARY_FILTER_COEFFICIENT^(k-1) to scale auxiliary predicates
  • Adds fast path for single equality predicate and returns best driving stats with adjusted row count
  • Removes estimateByEqOnPredicates/estimateByAuxiliaryPredicates and unused imports

Written by Cursor Bugbot for commit 34e8eda. This will update automatically on new commits. Configure here.

…repeated Statistics.buildFrom

Signed-off-by: stephen <stephen5217@163.com>
@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
B Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@github-actions

Copy link
Copy Markdown
Contributor

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions

Copy link
Copy Markdown
Contributor

[FE Incremental Coverage Report]

pass : 15 / 16 (93.75%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/sql/optimizer/statistics/StatisticsCalculator.java 15 16 93.75% [1651]

@github-actions

Copy link
Copy Markdown
Contributor

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

@alvin-phoenix-ai

Copy link
Copy Markdown
Contributor

@cursor review

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no bugs!

@kangkaisen kangkaisen merged commit 4e4b0f5 into StarRocks:main Jan 13, 2026
82 of 95 checks passed
@github-actions

Copy link
Copy Markdown
Contributor

@Mergifyio backport branch-4.0

@github-actions

Copy link
Copy Markdown
Contributor

@Mergifyio backport branch-3.5

@github-actions

Copy link
Copy Markdown
Contributor

@Mergifyio backport branch-4.1

@mergify

mergify Bot commented Jan 13, 2026

Copy link
Copy Markdown
Contributor

backport branch-4.0

✅ Backports have been created

Details

@mergify

mergify Bot commented Jan 13, 2026

Copy link
Copy Markdown
Contributor

backport branch-3.5

✅ Backports have been created

Details

@mergify

mergify Bot commented Jan 13, 2026

Copy link
Copy Markdown
Contributor

backport branch-4.1

❌ No backport have been created

Details
  • Backport to branch branch-4.1 failed

GitHub error: Branch not found

mergify Bot pushed a commit that referenced this pull request Jan 13, 2026
…repeated Statistics.buildFrom (#67773)

Signed-off-by: stephen <stephen5217@163.com>
(cherry picked from commit 4e4b0f5)
mergify Bot pushed a commit that referenced this pull request Jan 13, 2026
…repeated Statistics.buildFrom (#67773)

Signed-off-by: stephen <stephen5217@163.com>
(cherry picked from commit 4e4b0f5)
wanpengfei-git pushed a commit that referenced this pull request Jan 13, 2026
…repeated Statistics.buildFrom (backport #67773) (#67823)

Signed-off-by: stephen <stephen5217@163.com>
Co-authored-by: stephen <91597003+stephen-shelby@users.noreply.github.com>
farhad-celo pushed a commit to farhad-celo/starrocks that referenced this pull request Jan 20, 2026
…repeated Statistics.buildFrom (StarRocks#67773)

Signed-off-by: stephen <stephen5217@163.com>
Signed-off-by: Farhad Shahmohammadi <f.shahmohammadi@celonis.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants