executor: disable runtime filter on join key type mismatch (#10698)#10740
Conversation
📝 WalkthroughWalkthroughAdds a conservative gating in PhysicalJoin that skips runtime-filter generation when left/right join key protobuf field types are incompatible (including basic type, signed/unsigned, or decimal precision/scale mismatches). Adds tests that verify runtime filters are disabled on such type mismatches. Changes
Sequence Diagram(s)Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
dbms/src/Flash/tests/gtest_runtime_filter_disable_on_type_mismatch.cpp (2)
25-28: Consider removing the empty anonymous namespace.The anonymous namespace is currently empty and serves no purpose. You can safely remove it unless you plan to add file-local helpers later.
🧹 Proposed cleanup
-namespace -{ - -} // namespace -🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dbms/src/Flash/tests/gtest_runtime_filter_disable_on_type_mismatch.cpp` around lines 25 - 28, Remove the empty anonymous namespace block from the file (the unnamed "namespace { }" near the top of dbms/src/Flash/tests/gtest_runtime_filter_disable_on_type_mismatch.cpp) since it has no contents or purpose; simply delete the entire anonymous namespace declaration to clean up unused code, but leave the rest of the file and any includes/tests intact in case you add file-local helpers later.
49-55: Consider using the standardWRAP_FOR_TEST_BEGIN/ENDmacros.The codebase has established
WRAP_FOR_TEST_BEGINandWRAP_FOR_TEST_ENDmacros for ensuring test coverage across both DataStream and Pipeline execution engines. The custom wrapper macros here achieve similar goals but diverge from the standard pattern.If the standard macros don't fit this use case (e.g., different iteration needs), the custom approach is acceptable, but consistency with existing patterns is preferred when possible.
Based on learnings: "Use
WRAP_FOR_TEST_BEGINandWRAP_FOR_TEST_ENDmacros to ensure test coverage for both DataStream and Pipeline execution engines."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dbms/src/Flash/tests/gtest_runtime_filter_disable_on_type_mismatch.cpp` around lines 49 - 55, Replace the custom WrapForRuntimeFilterTestBegin / WrapForRuntimeFilterTestEnd macros with the project-standard WRAP_FOR_TEST_BEGIN and WRAP_FOR_TEST_END macros so the test runs under both DataStream and Pipeline execution engines consistently; locate the usage of WrapForRuntimeFilterTestBegin and WrapForRuntimeFilterTestEnd in the test file and swap them to WRAP_FOR_TEST_BEGIN and WRAP_FOR_TEST_END respectively (only keep a custom loop if WRAP_FOR_TEST_* cannot express the needed iteration, otherwise remove the custom macros to follow the established pattern).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@dbms/src/Flash/tests/gtest_runtime_filter_disable_on_type_mismatch.cpp`:
- Around line 25-28: Remove the empty anonymous namespace block from the file
(the unnamed "namespace { }" near the top of
dbms/src/Flash/tests/gtest_runtime_filter_disable_on_type_mismatch.cpp) since it
has no contents or purpose; simply delete the entire anonymous namespace
declaration to clean up unused code, but leave the rest of the file and any
includes/tests intact in case you add file-local helpers later.
- Around line 49-55: Replace the custom WrapForRuntimeFilterTestBegin /
WrapForRuntimeFilterTestEnd macros with the project-standard WRAP_FOR_TEST_BEGIN
and WRAP_FOR_TEST_END macros so the test runs under both DataStream and Pipeline
execution engines consistently; locate the usage of
WrapForRuntimeFilterTestBegin and WrapForRuntimeFilterTestEnd in the test file
and swap them to WRAP_FOR_TEST_BEGIN and WRAP_FOR_TEST_END respectively (only
keep a custom loop if WRAP_FOR_TEST_* cannot express the needed iteration,
otherwise remove the custom macros to follow the established pattern).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 3e4f3350-423d-4860-8e45-ecc7022cf71f
📒 Files selected for processing (2)
dbms/src/Flash/Planner/Plans/PhysicalJoin.cppdbms/src/Flash/tests/gtest_runtime_filter_disable_on_type_mismatch.cpp
7f57187 to
081981e
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dbms/src/Flash/tests/gtest_runtime_filter_disable_on_type_mismatch.cpp`:
- Around line 93-127: Replace the disabled row check for "table_scan_0" with an
explicit row-count assertion so the test actually verifies RF was not applied;
in the two blocks creating mock::MockRuntimeFilter and calling
testForExecutionSummary(request, expect) swap the Expect entry for
"table_scan_0" from {not_check_rows, not_check_concurrency} to the same concrete
row-check used by the sibling runtime-filter test (i.e. assert the expected
number of rows for table_scan_0) so testForExecutionSummary and Expect validate
rows rather than skipping them.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 30939ec0-d317-4243-acb7-388cf918e75b
📒 Files selected for processing (2)
dbms/src/Flash/Planner/Plans/PhysicalJoin.cppdbms/src/Flash/tests/gtest_runtime_filter_disable_on_type_mismatch.cpp
| Expect expect{ | ||
| {"table_scan_0", {not_check_rows, not_check_concurrency}}, | ||
| {"exchange_receiver_1", {4, concurrency}}, | ||
| {"Join_2", {3, concurrency}}}; | ||
| testForExecutionSummary(request, expect); | ||
| } | ||
|
|
||
| { | ||
| // With runtime filter requested, but type mismatch => runtime filter should be disabled. | ||
| mock::MockRuntimeFilter rf(1, col("k1"), col("k1"), "exchange_receiver_1", "table_scan_0"); | ||
| auto request | ||
| = context.scan("test_db", "left_table", std::vector<int>{1}) | ||
| .join(context.receive("right_exchange_table_i64"), tipb::JoinType::TypeInnerJoin, {col("k1")}, rf) | ||
| .build(context); | ||
| // Expect no RF pruning, same as baseline. | ||
| Expect expect{ | ||
| {"table_scan_0", {not_check_rows, not_check_concurrency}}, | ||
| {"exchange_receiver_1", {4, concurrency}}, | ||
| {"Join_2", {3, concurrency}}}; | ||
| testForExecutionSummary(request, expect); | ||
| } | ||
|
|
||
| { | ||
| // With runtime filter requested, but signed/unsigned mismatch => runtime filter should be disabled. | ||
| mock::MockRuntimeFilter rf(1, col("k1"), col("k1"), "exchange_receiver_1", "table_scan_0"); | ||
| auto request | ||
| = context.scan("test_db", "left_table", std::vector<int>{1}) | ||
| .join(context.receive("right_exchange_table_u32"), tipb::JoinType::TypeInnerJoin, {col("k1")}, rf) | ||
| .build(context); | ||
| // Expect no RF pruning, same as baseline. | ||
| Expect expect{ | ||
| {"table_scan_0", {not_check_rows, not_check_concurrency}}, | ||
| {"exchange_receiver_1", {4, concurrency}}, | ||
| {"Join_2", {3, concurrency}}}; | ||
| testForExecutionSummary(request, expect); |
There was a problem hiding this comment.
Assert table_scan_0 rows instead of skipping them.
not_check_rows disables validation in ExecutorTest::testForExecutionSummary(). With this dataset, Join_2 can still emit 3 rows even if RF incorrectly prunes the unmatched probe row k1 = 1, so these expectations do not actually prove that the mismatch guard disabled RF. The sibling runtime-filter test checks table_scan_0 rows for exactly this reason.
🧪 Tighten the execution-summary assertions
Expect expect{
- {"table_scan_0", {not_check_rows, not_check_concurrency}},
+ {"table_scan_0",
+ {3, context.context->getSettingsRef().enable_resource_control ? concurrency : 1}},
{"exchange_receiver_1", {4, concurrency}},
{"Join_2", {3, concurrency}}};
@@
Expect expect{
- {"table_scan_0", {not_check_rows, not_check_concurrency}},
+ {"table_scan_0",
+ {3, context.context->getSettingsRef().enable_resource_control ? concurrency : 1}},
{"exchange_receiver_1", {4, concurrency}},
{"Join_2", {3, concurrency}}};
@@
Expect expect{
- {"table_scan_0", {not_check_rows, not_check_concurrency}},
+ {"table_scan_0",
+ {3, context.context->getSettingsRef().enable_resource_control ? concurrency : 1}},
{"exchange_receiver_1", {4, concurrency}},
{"Join_2", {3, concurrency}}};🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dbms/src/Flash/tests/gtest_runtime_filter_disable_on_type_mismatch.cpp`
around lines 93 - 127, Replace the disabled row check for "table_scan_0" with an
explicit row-count assertion so the test actually verifies RF was not applied;
in the two blocks creating mock::MockRuntimeFilter and calling
testForExecutionSummary(request, expect) swap the Expect entry for
"table_scan_0" from {not_check_rows, not_check_concurrency} to the same concrete
row-check used by the sibling runtime-filter test (i.e. assert the expected
number of rows for table_scan_0) so testForExecutionSummary and Expect validate
rows rather than skipping them.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: windtalker, wshwsh12 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
This is an automated cherry-pick of #10698
What problem does this PR solve?
Issue Number: close #10699
Problem Summary:
What is changed and how it works?
Check List
Tests
Side effects
Documentation
Release note
Summary by CodeRabbit
Bug Fixes
Tests