[DNM] support fullouter join and nulleq join#10783
[DNM] support fullouter join and nulleq join#10783windtalker wants to merge 27 commits intopingcap:feature/release-8.5-materialized-viewfrom
Conversation
Signed-off-by: xufei <xufeixw@mail.ustc.edu.cn>
Signed-off-by: xufei <xufeixw@mail.ustc.edu.cn>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
📝 WalkthroughWalkthroughThis PR adds support for full outer join and NULL-safe equality ( Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~70 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
dbms/src/Debug/MockExecutor/JoinBinder.cpp (1)
172-191:⚠️ Potential issue | 🟡 MinorFail fast on
is_null_eq/ join-key count mismatches even with assertions off.
assert(...)disappears in release builds. If a caller passes a flag vector with the wrong length, this path will quietly serialize a malformedtipb::Joininstead of failing at the source. Please turn this into a runtime check before populatingjoin->is_null_eq.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dbms/src/Debug/MockExecutor/JoinBinder.cpp` around lines 172 - 191, Replace the assert(...) with a runtime validation that fails immediately if is_null_eq is non-empty and its size doesn't equal join_cols.size(); e.g., before the loops that call fillJoinKeyAndFieldType and before adding is_null_eq entries, add a check like if (!is_null_eq.empty() && is_null_eq.size() != join_cols.size()) and throw or return an error (std::invalid_argument or similar) so callers cannot produce a malformed tipb::Join when assertions are disabled; keep the rest of the logic (fillJoinKeyAndFieldType and join->add_is_null_eq) unchanged.dbms/src/Interpreters/JoinPartition.cpp (1)
1546-1551:⚠️ Potential issue | 🔴 CriticalPreserve filtered probe rows in FULL joins.
In this branch
row_flagged_mapalways callsaddNotFound(), which emits zero rows. ForASTTableJoin::Kind::Full, rows skipped byrow_filter_mapare still on the preserved side of the join (for example ordinary=keys containingNULL, or probe-side ON filters), so they must produce one NULL-extended output row. The lateraddNotFoundForFull()only fixes true hash misses, so full joins with other conditions still drop these probe rows.💡 Suggested fix
if (has_row_filter_map && (*row_filter_map)[i]) { if constexpr (row_flagged_map) { - block_full = RowFlaggedHashMapAdder<Map>::addNotFound(i, current_offset, offsets_to_replicate.get()); + if constexpr (KIND == ASTTableJoin::Kind::Full) + { + block_full = RowFlaggedHashMapAdder<Map>::addNotFoundForFull( + num_columns_to_add, + added_columns, + i, + current_offset, + offsets_to_replicate.get(), + probe_process_info); + } + else + { + block_full + = RowFlaggedHashMapAdder<Map>::addNotFound(i, current_offset, offsets_to_replicate.get()); + } } /// RightSemi/RightAnti without other conditions, just ignore not matched probe rows else if constexpr (KIND != ASTTableJoin::Kind::RightSemi && KIND != ASTTableJoin::Kind::RightAnti)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dbms/src/Interpreters/JoinPartition.cpp` around lines 1546 - 1551, The branch handling has_row_filter_map currently calls RowFlaggedHashMapAdder<Map>::addNotFound(), which emits zero rows and thus drops probe rows that must be preserved for ASTTableJoin::Kind::Full; change the call in JoinPartition.cpp (the branch using row_flagged_map, current_offset, offsets_to_replicate.get(), i) to call RowFlaggedHashMapAdder<Map>::addNotFoundForFull(...) when the join kind is ASTTableJoin::Kind::Full (or otherwise ensure the Full-join path uses addNotFoundForFull), so filtered probe rows produce a single NULL-extended output row matching how true hash misses are handled; keep existing parameters and flow for non-Full joins.
🧹 Nitpick comments (2)
dbms/src/Flash/tests/gtest_join_executor.cpp (1)
4264-4458: Run these new executor tests under both DataStream and Pipeline modes.These additions are currently not wrapped by the join test mode macro, so they may miss regressions that are engine-specific.
♻️ Suggested pattern
TEST_F(JoinExecutorTestRunner, FullOuterJoinWithOtherCondition) try { + WRAP_FOR_JOIN_TEST_BEGIN ... + WRAP_FOR_JOIN_TEST_END } CATCHApply the same wrapping to the other two new tests in this block.
Based on learnings: Wrap executor tests in
WRAP_FOR_TEST_BEGINandWRAP_FOR_TEST_ENDmacros to ensure test coverage for both DataStream and Pipeline execution engines.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dbms/src/Flash/tests/gtest_join_executor.cpp` around lines 4264 - 4458, The new full-outer-join tests (TEST_F(JoinExecutorTestRunner, FullOuterJoinWithOtherCondition), TEST_F(JoinExecutorTestRunner, FullOuterJoinWithoutOtherConditionAndNullKey), and TEST_F(JoinExecutorTestRunner, FullOuterJoinWithLeftAndRightConditions)) must be executed under both DataStream and Pipeline modes; wrap each TEST_F block with the engine-test macros by inserting WRAP_FOR_TEST_BEGIN before the test body and WRAP_FOR_TEST_END after the CATCH so each test is enclosed by WRAP_FOR_TEST_BEGIN(...) / WRAP_FOR_TEST_END(...) (using the same pattern used by other join tests) to ensure both execution engines run these cases.dbms/src/Interpreters/tests/gtest_join_null_eq.cpp (1)
323-410: Prefer the shared test column builders instead of reimplementing them here.This block recreates nullable number/string/fixed-string builders plus simple numeric columns. Reusing
createColumn<T>-style helpers would cut a lot of boilerplate and keep null / fixed-string encoding consistent with the rest of the test suite.Based on learnings, "Use
createColumn<T>helpers fromFunctionTestUtils.hto build test data instead of manual construction".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dbms/src/Interpreters/tests/gtest_join_null_eq.cpp` around lines 323 - 410, The test reimplements column builders (makeNullableNumberColumn, makeNullableInt32Column, makeNullableInt64Column, makeNullableStringColumn, makeNullableFixedStringColumn, makeUInt8Column) instead of using the shared helpers; replace these manual builders with the createColumn<T>/createNullableColumn<T>/createFixedStringColumn helpers from FunctionTestUtils.h (import that header), constructing the same values via those utilities so null maps and fixed-string padding follow the shared test-suite conventions, and remove the duplicated manual implementations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dbms/src/Flash/Coprocessor/JoinInterpreterHelper.h`:
- Around line 122-130: The current makeLeftJoinSideNullable and
makeRightJoinSideNullable compute nullability from tipb::JoinType which ignores
the normalized preserved/nullable sides returned by
getJoinKindAndBuildSideIndex; update these helpers to accept and use the
resolved join kind (e.g., ASTTableJoin::Kind) or the full tipb::Join (including
inner_idx) and determine nullability from that normalized result so the
preserved side logic is correct when inner_idx swaps sides; locate and change
the signatures of makeLeftJoinSideNullable/makeRightJoinSideNullable and any
callers to pass the normalized kind (or tipb::Join) and implement the boolean
checks against ASTTableJoin::Kind (or use inner_idx-aware logic) instead of raw
tipb::JoinType.
In `@dbms/src/Functions/FunctionsJson.h`:
- Around line 1108-1116: Replace the bare throws in the json_object path with
DB::Exception using the file's existing ErrorCodes and fmt-style constructor:
where you currently throw Exception("JSON documents may not contain NULL member
names.") (around key_nullmap / sources[col]) throw
DB::Exception(ErrorCodes::BAD_ARGUMENTS, "JSON documents may not contain NULL
member names."); and where you throw Exception("TiDB/TiFlash does not yet
support JSON objects with the key length >= 65536") (around key_from.size check)
throw DB::Exception(ErrorCodes::TOO_LARGE_ARRAY_OR_STRING, "TiDB/TiFlash does
not yet support JSON objects with the key length >= 65536"); ensure you include
DB::Exception and ErrorCodes in the same style used elsewhere in FunctionsJson.h
and keep the message text intact while passing the error code first.
In `@dbms/src/Interpreters/tests/gtest_join_null_eq.cpp`:
- Around line 58-70: ensureFunctionsRegistered currently swallows all
DB::Exception inside the lambda passed to std::call_once which incorrectly marks
the once flag even for real failures; change the code so the lambda only calls
registerFunctions() (no catch) and move the try/catch to wrap the std::call_once
call itself, catching DB::Exception and only suppressing the specific
duplicate-registration error (identify it via its error code or message) while
rethrowing any other DB::Exception; reference ensureFunctionsRegistered,
std::call_once, std::once_flag, and registerFunctions when making this change.
---
Outside diff comments:
In `@dbms/src/Debug/MockExecutor/JoinBinder.cpp`:
- Around line 172-191: Replace the assert(...) with a runtime validation that
fails immediately if is_null_eq is non-empty and its size doesn't equal
join_cols.size(); e.g., before the loops that call fillJoinKeyAndFieldType and
before adding is_null_eq entries, add a check like if (!is_null_eq.empty() &&
is_null_eq.size() != join_cols.size()) and throw or return an error
(std::invalid_argument or similar) so callers cannot produce a malformed
tipb::Join when assertions are disabled; keep the rest of the logic
(fillJoinKeyAndFieldType and join->add_is_null_eq) unchanged.
In `@dbms/src/Interpreters/JoinPartition.cpp`:
- Around line 1546-1551: The branch handling has_row_filter_map currently calls
RowFlaggedHashMapAdder<Map>::addNotFound(), which emits zero rows and thus drops
probe rows that must be preserved for ASTTableJoin::Kind::Full; change the call
in JoinPartition.cpp (the branch using row_flagged_map, current_offset,
offsets_to_replicate.get(), i) to call
RowFlaggedHashMapAdder<Map>::addNotFoundForFull(...) when the join kind is
ASTTableJoin::Kind::Full (or otherwise ensure the Full-join path uses
addNotFoundForFull), so filtered probe rows produce a single NULL-extended
output row matching how true hash misses are handled; keep existing parameters
and flow for non-Full joins.
---
Nitpick comments:
In `@dbms/src/Flash/tests/gtest_join_executor.cpp`:
- Around line 4264-4458: The new full-outer-join tests
(TEST_F(JoinExecutorTestRunner, FullOuterJoinWithOtherCondition),
TEST_F(JoinExecutorTestRunner, FullOuterJoinWithoutOtherConditionAndNullKey),
and TEST_F(JoinExecutorTestRunner, FullOuterJoinWithLeftAndRightConditions))
must be executed under both DataStream and Pipeline modes; wrap each TEST_F
block with the engine-test macros by inserting WRAP_FOR_TEST_BEGIN before the
test body and WRAP_FOR_TEST_END after the CATCH so each test is enclosed by
WRAP_FOR_TEST_BEGIN(...) / WRAP_FOR_TEST_END(...) (using the same pattern used
by other join tests) to ensure both execution engines run these cases.
In `@dbms/src/Interpreters/tests/gtest_join_null_eq.cpp`:
- Around line 323-410: The test reimplements column builders
(makeNullableNumberColumn, makeNullableInt32Column, makeNullableInt64Column,
makeNullableStringColumn, makeNullableFixedStringColumn, makeUInt8Column)
instead of using the shared helpers; replace these manual builders with the
createColumn<T>/createNullableColumn<T>/createFixedStringColumn helpers from
FunctionTestUtils.h (import that header), constructing the same values via those
utilities so null maps and fixed-string padding follow the shared test-suite
conventions, and remove the duplicated manual implementations.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: de687f35-0578-486e-9fb4-269fc33b74f3
📒 Files selected for processing (37)
AGENTS.mddbms/src/DataStreams/ScanHashMapAfterProbeBlockInputStream.cppdbms/src/Debug/MockExecutor/JoinBinder.cppdbms/src/Debug/MockExecutor/JoinBinder.hdbms/src/Flash/Coprocessor/DAGUtils.cppdbms/src/Flash/Coprocessor/JoinInterpreterHelper.cppdbms/src/Flash/Coprocessor/JoinInterpreterHelper.hdbms/src/Flash/Coprocessor/collectOutputFieldTypes.cppdbms/src/Flash/Coprocessor/tests/gtest_join_get_kind_and_build_index.cppdbms/src/Flash/Planner/Plans/PhysicalJoin.cppdbms/src/Flash/tests/gtest_join_executor.cppdbms/src/Flash/tests/gtest_spill_join.cppdbms/src/Functions/FunctionsJson.cppdbms/src/Functions/FunctionsJson.hdbms/src/Functions/tests/gtest_json_object.cppdbms/src/Interpreters/CrossJoinProbeHelper.cppdbms/src/Interpreters/Join.cppdbms/src/Interpreters/Join.hdbms/src/Interpreters/JoinHashMap.cppdbms/src/Interpreters/JoinHashMap.hdbms/src/Interpreters/JoinPartition.cppdbms/src/Interpreters/JoinPartition.hdbms/src/Interpreters/JoinUtils.cppdbms/src/Interpreters/JoinUtils.hdbms/src/Interpreters/NullAwareSemiJoinHelper.hdbms/src/Interpreters/ProbeProcessInfo.cppdbms/src/Interpreters/ProbeProcessInfo.hdbms/src/Interpreters/tests/gtest_join_null_eq.cppdbms/src/TestUtils/ColumnsToTiPBExpr.hdbms/src/TestUtils/mockExecutor.cppdbms/src/TestUtils/mockExecutor.hdbms/src/TestUtils/tests/gtest_mock_executors.cppdbms/src/TiDB/Decode/JsonBinary.cppdbms/src/TiDB/Decode/JsonBinary.hdocs/note/fullouter_join.mddocs/note/nulleq_join.mdtests/fullstack-test/expr/json_object.test
| constexpr bool makeLeftJoinSideNullable(tipb::JoinType join_type) | ||
| { | ||
| return join_type == tipb::JoinType::TypeRightOuterJoin || join_type == tipb::JoinType::TypeFullOuterJoin; | ||
| } | ||
|
|
||
| constexpr bool makeRightJoinSideNullable(tipb::JoinType join_type) | ||
| { | ||
| return join_type == tipb::JoinType::TypeLeftOuterJoin || join_type == tipb::JoinType::TypeFullOuterJoin; | ||
| } |
There was a problem hiding this comment.
These nullability helpers need the resolved join kind, not just tipb::JoinType.
getJoinKindAndBuildSideIndex() already flips TypeLeftOuterJoin / TypeRightOuterJoin when inner_idx swaps the preserved side. makeLeftJoinSideNullable() / makeRightJoinSideNullable() ignore that, so callers will mark the wrong side nullable for cases like TypeLeftOuterJoin + inner_idx == 0 and TypeRightOuterJoin + inner_idx == 1. Please derive side nullability from the normalized ASTTableJoin::Kind (or from the full tipb::Join including inner_idx) instead of the raw protobuf join type.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dbms/src/Flash/Coprocessor/JoinInterpreterHelper.h` around lines 122 - 130,
The current makeLeftJoinSideNullable and makeRightJoinSideNullable compute
nullability from tipb::JoinType which ignores the normalized preserved/nullable
sides returned by getJoinKindAndBuildSideIndex; update these helpers to accept
and use the resolved join kind (e.g., ASTTableJoin::Kind) or the full tipb::Join
(including inner_idx) and determine nullability from that normalized result so
the preserved side logic is correct when inner_idx swaps sides; locate and
change the signatures of makeLeftJoinSideNullable/makeRightJoinSideNullable and
any callers to pass the normalized kind (or tipb::Join) and implement the
boolean checks against ASTTableJoin::Kind (or use inner_idx-aware logic) instead
of raw tipb::JoinType.
| const auto * key_nullmap = nullmaps[col]; | ||
| if (!sources[col] || (key_nullmap && (*key_nullmap)[i])) | ||
| throw Exception("JSON documents may not contain NULL member names."); | ||
| } | ||
|
|
||
| assert(sources[col]); | ||
| const auto & key_from = sources[col]->getWhole(); | ||
| if (unlikely(key_from.size > std::numeric_limits<UInt16>::max())) | ||
| throw Exception("TiDB/TiFlash does not yet support JSON objects with the key length >= 65536"); |
There was a problem hiding this comment.
Attach explicit error codes to the new json_object failure paths.
These branches currently throw bare Exception instances, so NULL-member-name and oversized-key failures lose their error classification at the SQL layer. Please keep them aligned with the rest of this file and throw typed DB::Exceptions instead.
Suggested fix
- throw Exception("JSON documents may not contain NULL member names.");
+ throw Exception(
+ ErrorCodes::ILLEGAL_COLUMN,
+ "JSON documents may not contain NULL member names.");
...
- throw Exception("TiDB/TiFlash does not yet support JSON objects with the key length >= 65536");
+ throw Exception(
+ ErrorCodes::ILLEGAL_COLUMN,
+ "TiDB/TiFlash does not yet support JSON objects with the key length >= 65536");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dbms/src/Functions/FunctionsJson.h` around lines 1108 - 1116, Replace the
bare throws in the json_object path with DB::Exception using the file's existing
ErrorCodes and fmt-style constructor: where you currently throw Exception("JSON
documents may not contain NULL member names.") (around key_nullmap /
sources[col]) throw DB::Exception(ErrorCodes::BAD_ARGUMENTS, "JSON documents may
not contain NULL member names."); and where you throw Exception("TiDB/TiFlash
does not yet support JSON objects with the key length >= 65536") (around
key_from.size check) throw DB::Exception(ErrorCodes::TOO_LARGE_ARRAY_OR_STRING,
"TiDB/TiFlash does not yet support JSON objects with the key length >= 65536");
ensure you include DB::Exception and ErrorCodes in the same style used elsewhere
in FunctionsJson.h and keep the message text intact while passing the error code
first.
| void ensureFunctionsRegistered() | ||
| { | ||
| static std::once_flag once; | ||
| std::call_once(once, [] { | ||
| try | ||
| { | ||
| registerFunctions(); | ||
| } | ||
| catch (DB::Exception &) | ||
| { | ||
| // Another test suite may have already registered the functions. | ||
| } | ||
| }); |
There was a problem hiding this comment.
Don’t swallow every registerFunctions() failure inside call_once.
Because the catch lives inside std::call_once, any real registerFunctions() failure also marks the once-flag as satisfied. That hides the setup error for the rest of the process and later tests fail in much less obvious places. Please only ignore the expected duplicate-registration case and rethrow everything else.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dbms/src/Interpreters/tests/gtest_join_null_eq.cpp` around lines 58 - 70,
ensureFunctionsRegistered currently swallows all DB::Exception inside the lambda
passed to std::call_once which incorrectly marks the once flag even for real
failures; change the code so the lambda only calls registerFunctions() (no
catch) and move the try/catch to wrap the std::call_once call itself, catching
DB::Exception and only suppressing the specific duplicate-registration error
(identify it via its error code or message) while rethrowing any other
DB::Exception; reference ensureFunctionsRegistered, std::call_once,
std::once_flag, and registerFunctions when making this change.
a0f963c to
8e48e45
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
dbms/src/Interpreters/tests/gtest_join_null_eq.cpp (1)
58-71:⚠️ Potential issue | 🟡 MinorException handling inside
std::call_oncecan hide real failures.The
try/catchinside thestd::call_oncelambda marks the once-flag as satisfied even for unexpected failures. Only expected duplicate-registration exceptions should be suppressed. Consider restructuring to rethrow non-duplicate-registration exceptions.🛠️ Proposed fix
void ensureFunctionsRegistered() { static std::once_flag once; - std::call_once(once, [] { - try - { - registerFunctions(); - } - catch (DB::Exception &) - { - // Another test suite may have already registered the functions. - } - }); + try + { + std::call_once(once, [] { registerFunctions(); }); + } + catch (DB::Exception & e) + { + // Suppress expected duplicate registration; rethrow anything else + if (e.message().find("already registered") == String::npos) + throw; + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dbms/src/Interpreters/tests/gtest_join_null_eq.cpp` around lines 58 - 71, The current ensureFunctionsRegistered uses std::call_once with a lambda that swallows all DB::Exception, which marks the once flag as satisfied even for unexpected failures; modify the lambda (or move the try/catch outside the call_once) so you only suppress the known duplicate-registration error: catch DB::Exception &e and check its identity (e.g. compare e.code() to the duplicate-registration error code from ErrorCodes or check the exception message text that indicates "already registered"), ignore only in that case and rethrow for any other exception; reference ensureFunctionsRegistered, registerFunctions, std::call_once, the once flag, and DB::Exception when making this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dbms/src/Flash/tests/gtest_join_executor.cpp`:
- Around line 4264-4354: The new FullOuterJoinWithOtherCondition test cases
(TEST_F JoinExecutorTestRunner::FullOuterJoinWithOtherCondition and the adjacent
cases at lines ~4356-4410 and ~4412-4466) were added without the file's
WRAP_FOR_TEST_BEGIN/WRAP_FOR_TEST_END test harness, so they only run one
execution engine; wrap each test block (the TEST_F body that builds requests,
calls executeAndAssertColumnsEqual and ASSERT_COLUMNS_EQ_UR) with the
WRAP_FOR_TEST_BEGIN and WRAP_FOR_TEST_END macros (the same way existing join
tests do) so the cases run under both DataStream and Pipeline engines,
preserving the existing test structure around
context.scan(...).join(...).build(context), executeAndAssertColumnsEqual, and
the aggregation/ASSERT_COLUMNS_EQ_UR assertions.
In `@dbms/src/Functions/tests/gtest_json_object.cpp`:
- Line 15: The include <Columns/ColumnNullable.h> in
dbms/src/Functions/tests/gtest_json_object.cpp is unresolved and not used;
remove that include line (or if a symbol from ColumnNullable is actually needed,
replace it with the correct header path that defines that symbol). Update
gtest_json_object.cpp by deleting the `#include` <Columns/ColumnNullable.h> (or
swap to the correct header) so the compilation error from the missing header is
resolved.
In `@dbms/src/TestUtils/tests/gtest_mock_executors.cpp`:
- Around line 344-372: The test currently calls
collectOutputFieldTypes(*request) on the raw join built by
context.scan(...).join(...).build(context), so it never exercises column
pruning; modify the test to wrap the join request in a parent projection (or
equivalent pruning operator) that selects a subset of output columns (e.g., only
the join outputs you want to keep) before calling build(context), then call
collectOutputFieldTypes on that pruned request to assert the join executor’s
field types post-prune (keep using MockDAGRequestTest, collectOutputFieldTypes,
context.scan, join, and build to locate the code and reuse expected_size
assertions).
- Line 15: The test includes a missing header for collectOutputFieldTypes
causing compilation failure; update gtest_mock_executors.cpp to include the
correct public header that declares collectOutputFieldTypes (or adjust the
include path) so the symbol is resolved — locate usages of
collectOutputFieldTypes in this test and replace or add the proper include (the
header that actually exposes the collectOutputFieldTypes declaration) rather
than the non-existent Flash/Coprocessor/collectOutputFieldTypes.h.
---
Duplicate comments:
In `@dbms/src/Interpreters/tests/gtest_join_null_eq.cpp`:
- Around line 58-71: The current ensureFunctionsRegistered uses std::call_once
with a lambda that swallows all DB::Exception, which marks the once flag as
satisfied even for unexpected failures; modify the lambda (or move the try/catch
outside the call_once) so you only suppress the known duplicate-registration
error: catch DB::Exception &e and check its identity (e.g. compare e.code() to
the duplicate-registration error code from ErrorCodes or check the exception
message text that indicates "already registered"), ignore only in that case and
rethrow for any other exception; reference ensureFunctionsRegistered,
registerFunctions, std::call_once, the once flag, and DB::Exception when making
this change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: a757c79f-4497-47c9-8eaa-8ae661954a82
📒 Files selected for processing (33)
dbms/src/Debug/MockExecutor/JoinBinder.cppdbms/src/Debug/MockExecutor/JoinBinder.hdbms/src/Flash/Coprocessor/DAGUtils.cppdbms/src/Flash/Coprocessor/JoinInterpreterHelper.cppdbms/src/Flash/Coprocessor/JoinInterpreterHelper.hdbms/src/Flash/Coprocessor/tests/gtest_join_get_kind_and_build_index.cppdbms/src/Flash/Planner/Plans/PhysicalJoin.cppdbms/src/Flash/tests/gtest_join_executor.cppdbms/src/Flash/tests/gtest_spill_join.cppdbms/src/Functions/FunctionsJson.cppdbms/src/Functions/FunctionsJson.hdbms/src/Functions/tests/gtest_json_object.cppdbms/src/Interpreters/CrossJoinProbeHelper.cppdbms/src/Interpreters/Join.cppdbms/src/Interpreters/Join.hdbms/src/Interpreters/JoinHashMap.cppdbms/src/Interpreters/JoinHashMap.hdbms/src/Interpreters/JoinPartition.cppdbms/src/Interpreters/JoinPartition.hdbms/src/Interpreters/JoinUtils.cppdbms/src/Interpreters/JoinUtils.hdbms/src/Interpreters/NullAwareSemiJoinHelper.hdbms/src/Interpreters/ProbeProcessInfo.cppdbms/src/Interpreters/ProbeProcessInfo.hdbms/src/Interpreters/tests/gtest_join_null_eq.cppdbms/src/TestUtils/ColumnsToTiPBExpr.hdbms/src/TestUtils/mockExecutor.cppdbms/src/TestUtils/mockExecutor.hdbms/src/TestUtils/tests/gtest_mock_executors.cppdbms/src/TiDB/Decode/JsonBinary.cppdbms/src/TiDB/Decode/JsonBinary.hdocs/note/nulleq_join.mdtests/fullstack-test/expr/json_object.test
✅ Files skipped from review due to trivial changes (5)
- dbms/src/Functions/FunctionsJson.cpp
- dbms/src/TestUtils/ColumnsToTiPBExpr.h
- tests/fullstack-test/expr/json_object.test
- dbms/src/Interpreters/ProbeProcessInfo.h
- dbms/src/Functions/FunctionsJson.h
🚧 Files skipped from review as they are similar to previous changes (15)
- dbms/src/Flash/Coprocessor/DAGUtils.cpp
- dbms/src/TestUtils/mockExecutor.h
- dbms/src/TiDB/Decode/JsonBinary.h
- dbms/src/TestUtils/mockExecutor.cpp
- dbms/src/Interpreters/JoinPartition.h
- dbms/src/Debug/MockExecutor/JoinBinder.h
- dbms/src/Flash/tests/gtest_spill_join.cpp
- dbms/src/TiDB/Decode/JsonBinary.cpp
- dbms/src/Interpreters/NullAwareSemiJoinHelper.h
- dbms/src/Interpreters/JoinUtils.cpp
- dbms/src/Interpreters/JoinUtils.h
- dbms/src/Flash/Coprocessor/JoinInterpreterHelper.h
- dbms/src/Flash/Coprocessor/JoinInterpreterHelper.cpp
- dbms/src/Interpreters/Join.cpp
- dbms/src/Interpreters/JoinPartition.cpp
| TEST_F(JoinExecutorTestRunner, FullOuterJoinWithOtherCondition) | ||
| try | ||
| { | ||
| using tipb::JoinType; | ||
|
|
||
| const std::vector<std::tuple<ColumnsWithTypeAndName, ColumnsWithTypeAndName, ColumnsWithTypeAndName, String>> | ||
| test_cases | ||
| = {{ | ||
| {toNullableVec<Int32>("a", {1}), toNullableVec<Int32>("c", {10})}, | ||
| {toNullableVec<Int32>("a", {1}), toNullableVec<Int32>("c", {5})}, | ||
| {toNullableVec<Int32>("a", {1, {}}), | ||
| toNullableVec<Int32>("c", {10, {}}), | ||
| toNullableVec<Int32>("a", {{}, 1}), | ||
| toNullableVec<Int32>("c", {{}, 5})}, | ||
| "key matched but other condition failed for all rows", | ||
| }, | ||
| { | ||
| {toNullableVec<Int32>("a", {2}), toNullableVec<Int32>("c", {20})}, | ||
| {toNullableVec<Int32>("a", {1}), toNullableVec<Int32>("c", {5})}, | ||
| {toNullableVec<Int32>("a", {2, {}}), | ||
| toNullableVec<Int32>("c", {20, {}}), | ||
| toNullableVec<Int32>("a", {{}, 1}), | ||
| toNullableVec<Int32>("c", {{}, 5})}, | ||
| "key not matched", | ||
| }, | ||
| { | ||
| {toNullableVec<Int32>("a", {{}}), toNullableVec<Int32>("c", {10})}, | ||
| {toNullableVec<Int32>("a", {1}), toNullableVec<Int32>("c", {5})}, | ||
| {toNullableVec<Int32>("a", {{}, {}}), | ||
| toNullableVec<Int32>("c", {10, {}}), | ||
| toNullableVec<Int32>("a", {{}, 1}), | ||
| toNullableVec<Int32>("c", {{}, 5})}, | ||
| "probe-side null key should still be emitted as unmatched row", | ||
| }, | ||
| { | ||
| {toNullableVec<Int32>("a", {1}), toNullableVec<Int32>("c", {3})}, | ||
| {toNullableVec<Int32>("a", {1, 1}), toNullableVec<Int32>("c", {2, 4})}, | ||
| {toNullableVec<Int32>("a", {1, {}}), | ||
| toNullableVec<Int32>("c", {3, {}}), | ||
| toNullableVec<Int32>("a", {1, 1}), | ||
| toNullableVec<Int32>("c", {4, 2})}, | ||
| "only rows that pass other condition should be marked used", | ||
| }}; | ||
|
|
||
| for (const auto & [left, right, res, test_case_name] : test_cases) | ||
| { | ||
| SCOPED_TRACE(test_case_name); | ||
| context.addMockTable( | ||
| "full_outer_other_condition", | ||
| "t", | ||
| {{"a", TiDB::TP::TypeLong}, {"c", TiDB::TP::TypeLong}}, | ||
| left); | ||
| context.addMockTable( | ||
| "full_outer_other_condition", | ||
| "s", | ||
| {{"a", TiDB::TP::TypeLong}, {"c", TiDB::TP::TypeLong}}, | ||
| right); | ||
|
|
||
| auto request = context.scan("full_outer_other_condition", "t") | ||
| .join( | ||
| context.scan("full_outer_other_condition", "s"), | ||
| JoinType::TypeFullOuterJoin, | ||
| {col("a")}, | ||
| {}, | ||
| {}, | ||
| {lt(col("t.c"), col("s.c"))}, | ||
| {}, | ||
| 0, | ||
| false, | ||
| 1) | ||
| .build(context); | ||
| executeAndAssertColumnsEqual(request, res); | ||
|
|
||
| auto request_column_prune = context.scan("full_outer_other_condition", "t") | ||
| .join( | ||
| context.scan("full_outer_other_condition", "s"), | ||
| JoinType::TypeFullOuterJoin, | ||
| {col("a")}, | ||
| {}, | ||
| {}, | ||
| {lt(col("t.c"), col("s.c"))}, | ||
| {}, | ||
| 0, | ||
| false, | ||
| 1) | ||
| .aggregation({Count(lit(static_cast<UInt64>(1)))}, {}) | ||
| .build(context); | ||
| ASSERT_COLUMNS_EQ_UR(genScalarCountResults(res), executeStreams(request_column_prune, 2)); | ||
| } | ||
| } | ||
| CATCH |
There was a problem hiding this comment.
Run these full-outer executor cases under both execution modes.
These additions bypass the file’s existing WRAP_FOR_JOIN_TEST_BEGIN/END helper, so they only validate one engine. Since this PR is changing join execution semantics, that leaves the other engine untested for exactly the scenarios this suite is meant to lock down. Based on learnings: Wrap executor tests in WRAP_FOR_TEST_BEGIN and WRAP_FOR_TEST_END macros to ensure test coverage for both DataStream and Pipeline execution engines.
Also applies to: 4356-4410, 4412-4466
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dbms/src/Flash/tests/gtest_join_executor.cpp` around lines 4264 - 4354, The
new FullOuterJoinWithOtherCondition test cases (TEST_F
JoinExecutorTestRunner::FullOuterJoinWithOtherCondition and the adjacent cases
at lines ~4356-4410 and ~4412-4466) were added without the file's
WRAP_FOR_TEST_BEGIN/WRAP_FOR_TEST_END test harness, so they only run one
execution engine; wrap each test block (the TEST_F body that builds requests,
calls executeAndAssertColumnsEqual and ASSERT_COLUMNS_EQ_UR) with the
WRAP_FOR_TEST_BEGIN and WRAP_FOR_TEST_END macros (the same way existing join
tests do) so the cases run under both DataStream and Pipeline engines,
preserving the existing test structure around
context.scan(...).join(...).build(context), executeAndAssertColumnsEqual, and
the aggregation/ASSERT_COLUMNS_EQ_UR assertions.
| // See the License for the specific language governing permissions and | ||
| // limitations under the License. | ||
|
|
||
| #include <Columns/ColumnNullable.h> |
There was a problem hiding this comment.
Remove or fix unresolved header include to avoid build break
<Columns/ColumnNullable.h> is reported as not found by clang, and this file does not directly use symbols from it. This can fail compilation in CI; drop the include (or replace with the correct header path if genuinely required).
Proposed fix
-#include <Columns/ColumnNullable.h>
`#include` <TestUtils/FunctionTestUtils.h>
`#include` <TestUtils/TiFlashTestBasic.h>
`#include` <TiDB/Schema/TiDBTypes.h>
`#include` <gtest/gtest.h>🧰 Tools
🪛 Clang (14.0.6)
[error] 15-15: 'Columns/ColumnNullable.h' file not found
(clang-diagnostic-error)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dbms/src/Functions/tests/gtest_json_object.cpp` at line 15, The include
<Columns/ColumnNullable.h> in dbms/src/Functions/tests/gtest_json_object.cpp is
unresolved and not used; remove that include line (or if a symbol from
ColumnNullable is actually needed, replace it with the correct header path that
defines that symbol). Update gtest_json_object.cpp by deleting the `#include`
<Columns/ColumnNullable.h> (or swap to the correct header) so the compilation
error from the missing header is resolved.
| // See the License for the specific language governing permissions and | ||
| // limitations under the License. | ||
|
|
||
| #include <Flash/Coprocessor/collectOutputFieldTypes.h> |
There was a problem hiding this comment.
Fix the unresolved collectOutputFieldTypes include.
Clang is already reporting Flash/Coprocessor/collectOutputFieldTypes.h as missing, so this test file will not compile until the declaration is exposed through an existing public header or the include path is corrected.
🧰 Tools
🪛 Clang (14.0.6)
[error] 15-15: 'Flash/Coprocessor/collectOutputFieldTypes.h' file not found
(clang-diagnostic-error)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dbms/src/TestUtils/tests/gtest_mock_executors.cpp` at line 15, The test
includes a missing header for collectOutputFieldTypes causing compilation
failure; update gtest_mock_executors.cpp to include the correct public header
that declares collectOutputFieldTypes (or adjust the include path) so the symbol
is resolved — locate usages of collectOutputFieldTypes in this test and replace
or add the proper include (the header that actually exposes the
collectOutputFieldTypes declaration) rather than the non-existent
Flash/Coprocessor/collectOutputFieldTypes.h.
| TEST_F(MockDAGRequestTest, SemiJoinColumnPruneKeepsJoinOutputSchema) | ||
| try | ||
| { | ||
| const std::vector<std::pair<tipb::JoinType, size_t>> test_cases = { | ||
| {tipb::JoinType::TypeSemiJoin, 3}, | ||
| {tipb::JoinType::TypeAntiSemiJoin, 3}, | ||
| {tipb::JoinType::TypeLeftOuterSemiJoin, 4}, | ||
| {tipb::JoinType::TypeAntiLeftOuterSemiJoin, 4}, | ||
| }; | ||
|
|
||
| for (const auto & [join_type, expected_size] : test_cases) | ||
| { | ||
| auto request = context.scan("test_db", "l_table") | ||
| .join(context.scan("test_db", "r_table"), join_type, {col("join_c")}) | ||
| .build(context); | ||
|
|
||
| auto output_field_types = collectOutputFieldTypes(*request); | ||
| ASSERT_EQ(output_field_types.size(), expected_size) << fmt::underlying(join_type); | ||
| ASSERT_EQ(output_field_types[0].tp(), TiDB::TypeLong); | ||
| ASSERT_EQ(output_field_types[1].tp(), TiDB::TypeString); | ||
| ASSERT_EQ(output_field_types[2].tp(), TiDB::TypeString); | ||
|
|
||
| if (expected_size == 4) | ||
| { | ||
| ASSERT_EQ(output_field_types[3].tp(), TiDB::TypeTiny); | ||
| ASSERT_EQ(output_field_types[3].flag() & TiDB::ColumnFlagNotNull, 0); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
This test never exercises column pruning.
collectOutputFieldTypes(*request) is being called on a bare join, so it only checks the default join output schema. To cover the regression implied by the test name, add a parent that actually prunes join outputs and then assert the join executor’s field types after that prune step.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dbms/src/TestUtils/tests/gtest_mock_executors.cpp` around lines 344 - 372,
The test currently calls collectOutputFieldTypes(*request) on the raw join built
by context.scan(...).join(...).build(context), so it never exercises column
pruning; modify the test to wrap the join request in a parent projection (or
equivalent pruning operator) that selects a subset of output columns (e.g., only
the join outputs you want to keep) before calling build(context), then call
collectOutputFieldTypes on that pruned request to assert the join executor’s
field types post-prune (keep using MockDAGRequestTest, collectOutputFieldTypes,
context.scan, join, and build to locate the code and reuse expected_size
assertions).
What problem does this PR solve?
Issue Number: close #xxx
Problem Summary:
What is changed and how it works?
Check List
Tests
Side effects
Documentation
Release note
Summary by CodeRabbit
New Features
FULL OUTER JOINexecution with proper output schema nullability and non-equality conditions.<=>) operator support for join keys, allowing NULL values to participate in key matching.json_object()SQL function for constructing JSON objects.Bug Fixes