[DNM] support fullouter join and nulleq join by windtalker · Pull Request #10783 · pingcap/tiflash

windtalker · 2026-04-01T01:12:38Z

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

What is changed and how it works?

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

None

Summary by CodeRabbit

New Features
- Added support for FULL OUTER JOIN execution with proper output schema nullability and non-equality conditions.
- Implemented NULL-safe equality (<=>) operator support for join keys, allowing NULL values to participate in key matching.
- Added json_object() SQL function for constructing JSON objects.
Bug Fixes
- Fixed join execution with nullable key columns across hash map selection and probe phases.
- Improved runtime filter handling for joins with null-aware key semantics.

Signed-off-by: xufei <xufeixw@mail.ustc.edu.cn>

ti-chi-bot · 2026-04-01T01:12:43Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign ywqzzy for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-04-01T01:13:01Z

📝 Walkthrough

Walkthrough

This PR adds support for full outer join and NULL-safe equality (NullEQ) in TiFlash's hash join implementation. It threads an is_null_eq flag vector through join planning and execution, introduces nullable key hash map methods, refactors null/filter map handling, implements the json_object() function with binary JSON object building, and extends the test infrastructure with comprehensive coverage for new join semantics.

Changes

Cohort / File(s)	Summary
Core Join Execution Infrastructure `dbms/src/Interpreters/Join.cpp`, `dbms/src/Interpreters/Join.h`	Extended `Join` constructor to accept `is_null_eq` vector; updated key column preparation to preserve `ColumnNullable` wrappers for `NullEQ` keys; refactored null-map extraction to produce unified `row_filter_map` distinguishing ordinary `=` key NULLs from `NullEQ` key NULLs; added special `Full` join handling with row-flagged hash map logic and kept-row tracking.
Hash Map Methods & Selection `dbms/src/Interpreters/JoinHashMap.cpp`, `dbms/src/Interpreters/JoinHashMap.h`	Added `nullable_keys128` and `nullable_keys256` join map methods; updated `chooseJoinMapMethod` to accept `is_null_eq` and select nullable-specific methods when nullable `NullEQ` keys exist; unwraps nullable columns for type-dispatch logic.
Partition-Level Join Operations `dbms/src/Interpreters/JoinPartition.cpp`, `dbms/src/Interpreters/JoinPartition.h`	Renamed `null_map` → `row_filter_map` throughout partition building/probing; added `KeyGetterForTypeImpl` specializations for nullable key methods; extended `RowFlaggedHashMapAdder` with `addNotFoundForFull` for full-join specific materialization; updated dispatch logic for full-join row-flagged map handling.
Probe Phase Filtering `dbms/src/Interpreters/ProbeProcessInfo.cpp`, `dbms/src/Interpreters/ProbeProcessInfo.h`, `dbms/src/Interpreters/JoinUtils.cpp`, `dbms/src/Interpreters/JoinUtils.h`	Added `extractJoinKeyColumnsAndFilterNullMap` to derive per-key null-equality aware filter maps; refactored `ProbeProcessInfo` to use `row_filter_map` instead of `null_map`; extended `prepareForHashProbe` signature with `is_null_eq` parameter; updated `isNecessaryKindToUseRowFlaggedHashMap` to include `Full` joins.
Join Type & Build-Side Mapping `dbms/src/Flash/Coprocessor/JoinInterpreterHelper.cpp`, `dbms/src/Flash/Coprocessor/JoinInterpreterHelper.h`	Added `getJoinKeyNullEqFlags` to extract per-key null-eq flags from `tipb::Join`; introduced `makeLeftJoinSideNullable` / `makeRightJoinSideNullable` helpers; added `alignNullEqKeyTypes` to reconcile nullable key types between probe/build sides; extended validation to reject `TypeFullOuterJoin` without join keys and enforce null-eq/null-aware-semi mutual exclusion; added `shouldDisableRuntimeFilter` to gate runtime filters based on nullable `NullEQ` keys.
Cross-Join & Null-Aware Semi-Join Plumbing `dbms/src/Interpreters/CrossJoinProbeHelper.cpp`, `dbms/src/Interpreters/NullAwareSemiJoinHelper.h`	Replaced `null_map`-based filtering with `row_filter_map` in cross-join probe paths; renamed `NALeftSideInfo` constructor parameters `null_map_` → `key_null_map_` and `filter_map_` → `row_filter_map_` to clarify semantics.
Physical Plan Building `dbms/src/Flash/Planner/Plans/PhysicalJoin.cpp`, `dbms/src/Flash/Coprocessor/collectOutputFieldTypes.cpp`	Integrated `alignNullEqKeyTypes` call after key preparation; added conditional runtime-filter construction via `shouldDisableRuntimeFilter`; updated output nullability logic to use `makeLeftJoinSideNullable` / `makeRightJoinSideNullable` helpers.
DAG Compilation & Mock Executor `dbms/src/Flash/Coprocessor/DAGUtils.cpp`, `dbms/src/Debug/MockExecutor/JoinBinder.cpp`, `dbms/src/Debug/MockExecutor/JoinBinder.h`, `dbms/src/TestUtils/mockExecutor.cpp`, `dbms/src/TestUtils/mockExecutor.h`	Enabled `JsonObjectSig` mapping in scalar function table; added `getJoinTypeName` support for `TypeFullOuterJoin`; extended `compileJoin` signature with `is_null_eq` vector; updated `DAGRequestBuilder::join` to thread `is_null_eq` through mock compilation.
Nullable Key Type Changes `dbms/src/DataStreams/ScanHashMapAfterProbeBlockInputStream.cpp`	Added handling for `Full` join kind in dispatch loop, selecting `fillColumnsUsingCurrentPartition` variants based on `has_other_condition` flag.
JSON Object Function `dbms/src/Functions/FunctionsJson.h`, `dbms/src/Functions/FunctionsJson.cpp`	Implemented `FunctionJsonObject` supporting variadic key/value pairs, nullable value handling, and validation that keys are non-NULL strings; registered function in factory.
JSON Binary Encoding `dbms/src/TiDB/Decode/JsonBinary.cpp`, `dbms/src/TiDB/Decode/JsonBinary.h`	Added `buildBinaryJsonObjectInBuffer` to construct binary-encoded JSON objects from key/value vectors, including size validation, offset computation, and depth tracking.
Test Utilities & Headers `dbms/src/TestUtils/ColumnsToTiPBExpr.h`	Added `#include <tipb/expression.pb.h>` to support direct `tipb::Expr` references.
Core Join Semantics Tests `dbms/src/Interpreters/tests/gtest_join_null_eq.cpp`	Comprehensive test suite for NULL-equality join semantics: map-method selection with nullable keys, build-side NULL insertion, per-key null-eq behavior across inner/left/right/full/semi/anti join kinds, mixed-key scenarios, and full/right-outer with other conditions.
High-Level Join Tests `dbms/src/Flash/Coprocessor/tests/gtest_join_get_kind_and_build_index.cpp`, `dbms/src/Flash/tests/gtest_join_executor.cpp`, `dbms/src/Flash/tests/gtest_spill_join.cpp`, `dbms/src/TestUtils/tests/gtest_mock_executors.cpp`	Extended existing test fixtures with full-outer-join cases (with/without other conditions), null-eq key alignment, runtime-filter disabling behavior, full join Cartesian rejection, output/filter nullability validation, and column-pruning correctness; added JSON object serialization and full-outer schema nullability tests; introduced spill variants with null-eq mock sources and fine-grained shuffle join tests.
JSON Object Function Tests `dbms/src/Functions/tests/gtest_json_object.cpp`	Unit test suite for `json_object` covering empty arguments, key/value assembly, duplicate keys, NULL values as JSON nulls, and NULL key rejection.
SQL Fullstack Test `tests/fullstack-test/expr/json_object.test`	End-to-end SQL test for `json_object()` function across MPP mode with various input patterns and NULL handling.
Design Documentation `docs/note/fullouter_join.md`, `docs/note/nulleq_join.md`	Comprehensive design documents defining full-outer-join scope, correctness issues, and implementation steps; detailed NullEQ semantics, implementation assumptions, and test milestones (in English and Chinese respectively).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

Possibly related PRs

Support json_object pushdown #10744: Parallel json_object implementation (function class, JSON binary builder, DAGUtils mapping, tests) present in this PR.
support nulleq in tiflash #10726: Adds tidbNullEQ scalar/function mappings and tests; this PR integrates null-eq flags across join planning/execution and updates corresponding DAG infrastructure.
executor: disable runtime filter on join key type mismatch (#10698) #10740: Changes runtime-filter logic in PhysicalJoin.cpp; this PR adds type-mismatch-based disabling via shouldDisableRuntimeFilter on nullable NullEQ keys.

Suggested labels

approved, lgtm

Suggested reviewers

gengliqi
JinheLin

Poem

🐰 Whiskers twitching with delight,
Full joins now dance left-outer-right!
NullEQ keys in harmony blend,
JSON objects build {} end-to-end—
Hops of logic, perfectly tight! 🌿

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is largely incomplete, containing only placeholder template text with no actual problem statement, implementation details, test documentation, or release notes filled in.	Complete the PR description by filling in the actual problem statement, explaining the changes made, documenting which tests were added or run, and providing a meaningful release note.
Docstring Coverage	⚠️ Warning	Docstring coverage is 6.01% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[DNM] support fullouter join and nulleq join' clearly summarizes the main changes: adding support for full outer joins and null-equal semantics in join operations.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

dbms/src/Debug/MockExecutor/JoinBinder.cpp (1)

172-191: ⚠️ Potential issue | 🟡 Minor

Fail fast on is_null_eq / join-key count mismatches even with assertions off.

assert(...) disappears in release builds. If a caller passes a flag vector with the wrong length, this path will quietly serialize a malformed tipb::Join instead of failing at the source. Please turn this into a runtime check before populating join->is_null_eq.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbms/src/Debug/MockExecutor/JoinBinder.cpp` around lines 172 - 191, Replace
the assert(...) with a runtime validation that fails immediately if is_null_eq
is non-empty and its size doesn't equal join_cols.size(); e.g., before the loops
that call fillJoinKeyAndFieldType and before adding is_null_eq entries, add a
check like if (!is_null_eq.empty() && is_null_eq.size() != join_cols.size()) and
throw or return an error (std::invalid_argument or similar) so callers cannot
produce a malformed tipb::Join when assertions are disabled; keep the rest of
the logic (fillJoinKeyAndFieldType and join->add_is_null_eq) unchanged.

dbms/src/Interpreters/JoinPartition.cpp (1)

1546-1551: ⚠️ Potential issue | 🔴 Critical

Preserve filtered probe rows in FULL joins.

In this branch row_flagged_map always calls addNotFound(), which emits zero rows. For ASTTableJoin::Kind::Full, rows skipped by row_filter_map are still on the preserved side of the join (for example ordinary = keys containing NULL, or probe-side ON filters), so they must produce one NULL-extended output row. The later addNotFoundForFull() only fixes true hash misses, so full joins with other conditions still drop these probe rows.

💡 Suggested fix

         if (has_row_filter_map && (*row_filter_map)[i])
         {
             if constexpr (row_flagged_map)
             {
-                block_full = RowFlaggedHashMapAdder<Map>::addNotFound(i, current_offset, offsets_to_replicate.get());
+                if constexpr (KIND == ASTTableJoin::Kind::Full)
+                {
+                    block_full = RowFlaggedHashMapAdder<Map>::addNotFoundForFull(
+                        num_columns_to_add,
+                        added_columns,
+                        i,
+                        current_offset,
+                        offsets_to_replicate.get(),
+                        probe_process_info);
+                }
+                else
+                {
+                    block_full
+                        = RowFlaggedHashMapAdder<Map>::addNotFound(i, current_offset, offsets_to_replicate.get());
+                }
             }
             /// RightSemi/RightAnti without other conditions, just ignore not matched probe rows
             else if constexpr (KIND != ASTTableJoin::Kind::RightSemi && KIND != ASTTableJoin::Kind::RightAnti)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dbms/src/Interpreters/JoinPartition.cpp` around lines 1546 - 1551, The branch
handling has_row_filter_map currently calls
RowFlaggedHashMapAdder<Map>::addNotFound(), which emits zero rows and thus drops
probe rows that must be preserved for ASTTableJoin::Kind::Full; change the call
in JoinPartition.cpp (the branch using row_flagged_map, current_offset,
offsets_to_replicate.get(), i) to call
RowFlaggedHashMapAdder<Map>::addNotFoundForFull(...) when the join kind is
ASTTableJoin::Kind::Full (or otherwise ensure the Full-join path uses
addNotFoundForFull), so filtered probe rows produce a single NULL-extended
output row matching how true hash misses are handled; keep existing parameters
and flow for non-Full joins.

🧹 Nitpick comments (2)

dbms/src/Flash/tests/gtest_join_executor.cpp (1)
4264-4458: Run these new executor tests under both DataStream and Pipeline modes.

These additions are currently not wrapped by the join test mode macro, so they may miss regressions that are engine-specific.
♻️ Suggested pattern
 TEST_F(JoinExecutorTestRunner, FullOuterJoinWithOtherCondition)
 try
 {
+    WRAP_FOR_JOIN_TEST_BEGIN
     ...
+    WRAP_FOR_JOIN_TEST_END
 }
 CATCH
Apply the same wrapping to the other two new tests in this block.
Based on learnings: Wrap executor tests in WRAP_FOR_TEST_BEGIN and WRAP_FOR_TEST_END macros to ensure test coverage for both DataStream and Pipeline execution engines.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbms/src/Flash/tests/gtest_join_executor.cpp` around lines 4264 - 4458, The
new full-outer-join tests (TEST_F(JoinExecutorTestRunner,
FullOuterJoinWithOtherCondition), TEST_F(JoinExecutorTestRunner,
FullOuterJoinWithoutOtherConditionAndNullKey), and
TEST_F(JoinExecutorTestRunner, FullOuterJoinWithLeftAndRightConditions)) must be
executed under both DataStream and Pipeline modes; wrap each TEST_F block with
the engine-test macros by inserting WRAP_FOR_TEST_BEGIN before the test body and
WRAP_FOR_TEST_END after the CATCH so each test is enclosed by
WRAP_FOR_TEST_BEGIN(...) / WRAP_FOR_TEST_END(...) (using the same pattern used
by other join tests) to ensure both execution engines run these cases.
dbms/src/Interpreters/tests/gtest_join_null_eq.cpp (1)
323-410: Prefer the shared test column builders instead of reimplementing them here.

This block recreates nullable number/string/fixed-string builders plus simple numeric columns. Reusing createColumn<T>-style helpers would cut a lot of boilerplate and keep null / fixed-string encoding consistent with the rest of the test suite.

Based on learnings, "Use createColumn<T> helpers from FunctionTestUtils.h to build test data instead of manual construction".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbms/src/Interpreters/tests/gtest_join_null_eq.cpp` around lines 323 - 410,
The test reimplements column builders (makeNullableNumberColumn,
makeNullableInt32Column, makeNullableInt64Column, makeNullableStringColumn,
makeNullableFixedStringColumn, makeUInt8Column) instead of using the shared
helpers; replace these manual builders with the
createColumn<T>/createNullableColumn<T>/createFixedStringColumn helpers from
FunctionTestUtils.h (import that header), constructing the same values via those
utilities so null maps and fixed-string padding follow the shared test-suite
conventions, and remove the duplicated manual implementations.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dbms/src/Flash/Coprocessor/JoinInterpreterHelper.h`:
- Around line 122-130: The current makeLeftJoinSideNullable and
makeRightJoinSideNullable compute nullability from tipb::JoinType which ignores
the normalized preserved/nullable sides returned by
getJoinKindAndBuildSideIndex; update these helpers to accept and use the
resolved join kind (e.g., ASTTableJoin::Kind) or the full tipb::Join (including
inner_idx) and determine nullability from that normalized result so the
preserved side logic is correct when inner_idx swaps sides; locate and change
the signatures of makeLeftJoinSideNullable/makeRightJoinSideNullable and any
callers to pass the normalized kind (or tipb::Join) and implement the boolean
checks against ASTTableJoin::Kind (or use inner_idx-aware logic) instead of raw
tipb::JoinType.

In `@dbms/src/Functions/FunctionsJson.h`:
- Around line 1108-1116: Replace the bare throws in the json_object path with
DB::Exception using the file's existing ErrorCodes and fmt-style constructor:
where you currently throw Exception("JSON documents may not contain NULL member
names.") (around key_nullmap / sources[col]) throw
DB::Exception(ErrorCodes::BAD_ARGUMENTS, "JSON documents may not contain NULL
member names."); and where you throw Exception("TiDB/TiFlash does not yet
support JSON objects with the key length >= 65536") (around key_from.size check)
throw DB::Exception(ErrorCodes::TOO_LARGE_ARRAY_OR_STRING, "TiDB/TiFlash does
not yet support JSON objects with the key length >= 65536"); ensure you include
DB::Exception and ErrorCodes in the same style used elsewhere in FunctionsJson.h
and keep the message text intact while passing the error code first.

In `@dbms/src/Interpreters/tests/gtest_join_null_eq.cpp`:
- Around line 58-70: ensureFunctionsRegistered currently swallows all
DB::Exception inside the lambda passed to std::call_once which incorrectly marks
the once flag even for real failures; change the code so the lambda only calls
registerFunctions() (no catch) and move the try/catch to wrap the std::call_once
call itself, catching DB::Exception and only suppressing the specific
duplicate-registration error (identify it via its error code or message) while
rethrowing any other DB::Exception; reference ensureFunctionsRegistered,
std::call_once, std::once_flag, and registerFunctions when making this change.

---

Outside diff comments:
In `@dbms/src/Debug/MockExecutor/JoinBinder.cpp`:
- Around line 172-191: Replace the assert(...) with a runtime validation that
fails immediately if is_null_eq is non-empty and its size doesn't equal
join_cols.size(); e.g., before the loops that call fillJoinKeyAndFieldType and
before adding is_null_eq entries, add a check like if (!is_null_eq.empty() &&
is_null_eq.size() != join_cols.size()) and throw or return an error
(std::invalid_argument or similar) so callers cannot produce a malformed
tipb::Join when assertions are disabled; keep the rest of the logic
(fillJoinKeyAndFieldType and join->add_is_null_eq) unchanged.

In `@dbms/src/Interpreters/JoinPartition.cpp`:
- Around line 1546-1551: The branch handling has_row_filter_map currently calls
RowFlaggedHashMapAdder<Map>::addNotFound(), which emits zero rows and thus drops
probe rows that must be preserved for ASTTableJoin::Kind::Full; change the call
in JoinPartition.cpp (the branch using row_flagged_map, current_offset,
offsets_to_replicate.get(), i) to call
RowFlaggedHashMapAdder<Map>::addNotFoundForFull(...) when the join kind is
ASTTableJoin::Kind::Full (or otherwise ensure the Full-join path uses
addNotFoundForFull), so filtered probe rows produce a single NULL-extended
output row matching how true hash misses are handled; keep existing parameters
and flow for non-Full joins.

---

Nitpick comments:
In `@dbms/src/Flash/tests/gtest_join_executor.cpp`:
- Around line 4264-4458: The new full-outer-join tests
(TEST_F(JoinExecutorTestRunner, FullOuterJoinWithOtherCondition),
TEST_F(JoinExecutorTestRunner, FullOuterJoinWithoutOtherConditionAndNullKey),
and TEST_F(JoinExecutorTestRunner, FullOuterJoinWithLeftAndRightConditions))
must be executed under both DataStream and Pipeline modes; wrap each TEST_F
block with the engine-test macros by inserting WRAP_FOR_TEST_BEGIN before the
test body and WRAP_FOR_TEST_END after the CATCH so each test is enclosed by
WRAP_FOR_TEST_BEGIN(...) / WRAP_FOR_TEST_END(...) (using the same pattern used
by other join tests) to ensure both execution engines run these cases.

In `@dbms/src/Interpreters/tests/gtest_join_null_eq.cpp`:
- Around line 323-410: The test reimplements column builders
(makeNullableNumberColumn, makeNullableInt32Column, makeNullableInt64Column,
makeNullableStringColumn, makeNullableFixedStringColumn, makeUInt8Column)
instead of using the shared helpers; replace these manual builders with the
createColumn<T>/createNullableColumn<T>/createFixedStringColumn helpers from
FunctionTestUtils.h (import that header), constructing the same values via those
utilities so null maps and fixed-string padding follow the shared test-suite
conventions, and remove the duplicated manual implementations.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: de687f35-0578-486e-9fb4-269fc33b74f3

📥 Commits

Reviewing files that changed from the base of the PR and between 1c34cbe and a0f963c.

📒 Files selected for processing (37)

AGENTS.md
dbms/src/DataStreams/ScanHashMapAfterProbeBlockInputStream.cpp
dbms/src/Debug/MockExecutor/JoinBinder.cpp
dbms/src/Debug/MockExecutor/JoinBinder.h
dbms/src/Flash/Coprocessor/DAGUtils.cpp
dbms/src/Flash/Coprocessor/JoinInterpreterHelper.cpp
dbms/src/Flash/Coprocessor/JoinInterpreterHelper.h
dbms/src/Flash/Coprocessor/collectOutputFieldTypes.cpp
dbms/src/Flash/Coprocessor/tests/gtest_join_get_kind_and_build_index.cpp
dbms/src/Flash/Planner/Plans/PhysicalJoin.cpp
dbms/src/Flash/tests/gtest_join_executor.cpp
dbms/src/Flash/tests/gtest_spill_join.cpp
dbms/src/Functions/FunctionsJson.cpp
dbms/src/Functions/FunctionsJson.h
dbms/src/Functions/tests/gtest_json_object.cpp
dbms/src/Interpreters/CrossJoinProbeHelper.cpp
dbms/src/Interpreters/Join.cpp
dbms/src/Interpreters/Join.h
dbms/src/Interpreters/JoinHashMap.cpp
dbms/src/Interpreters/JoinHashMap.h
dbms/src/Interpreters/JoinPartition.cpp
dbms/src/Interpreters/JoinPartition.h
dbms/src/Interpreters/JoinUtils.cpp
dbms/src/Interpreters/JoinUtils.h
dbms/src/Interpreters/NullAwareSemiJoinHelper.h
dbms/src/Interpreters/ProbeProcessInfo.cpp
dbms/src/Interpreters/ProbeProcessInfo.h
dbms/src/Interpreters/tests/gtest_join_null_eq.cpp
dbms/src/TestUtils/ColumnsToTiPBExpr.h
dbms/src/TestUtils/mockExecutor.cpp
dbms/src/TestUtils/mockExecutor.h
dbms/src/TestUtils/tests/gtest_mock_executors.cpp
dbms/src/TiDB/Decode/JsonBinary.cpp
dbms/src/TiDB/Decode/JsonBinary.h
docs/note/fullouter_join.md
docs/note/nulleq_join.md
tests/fullstack-test/expr/json_object.test

coderabbitai · 2026-04-01T01:29:46Z

dbms/src/Flash/Coprocessor/JoinInterpreterHelper.h

+constexpr bool makeLeftJoinSideNullable(tipb::JoinType join_type)
+{
+    return join_type == tipb::JoinType::TypeRightOuterJoin || join_type == tipb::JoinType::TypeFullOuterJoin;
+}
+
+constexpr bool makeRightJoinSideNullable(tipb::JoinType join_type)
+{
+    return join_type == tipb::JoinType::TypeLeftOuterJoin || join_type == tipb::JoinType::TypeFullOuterJoin;
+}


⚠️ Potential issue | 🟠 Major

These nullability helpers need the resolved join kind, not just tipb::JoinType.

getJoinKindAndBuildSideIndex() already flips TypeLeftOuterJoin / TypeRightOuterJoin when inner_idx swaps the preserved side. makeLeftJoinSideNullable() / makeRightJoinSideNullable() ignore that, so callers will mark the wrong side nullable for cases like TypeLeftOuterJoin + inner_idx == 0 and TypeRightOuterJoin + inner_idx == 1. Please derive side nullability from the normalized ASTTableJoin::Kind (or from the full tipb::Join including inner_idx) instead of the raw protobuf join type.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@dbms/src/Flash/Coprocessor/JoinInterpreterHelper.h` around lines 122 - 130, The current makeLeftJoinSideNullable and makeRightJoinSideNullable compute nullability from tipb::JoinType which ignores the normalized preserved/nullable sides returned by getJoinKindAndBuildSideIndex; update these helpers to accept and use the resolved join kind (e.g., ASTTableJoin::Kind) or the full tipb::Join (including inner_idx) and determine nullability from that normalized result so the preserved side logic is correct when inner_idx swaps sides; locate and change the signatures of makeLeftJoinSideNullable/makeRightJoinSideNullable and any callers to pass the normalized kind (or tipb::Join) and implement the boolean checks against ASTTableJoin::Kind (or use inner_idx-aware logic) instead of raw tipb::JoinType.

coderabbitai · 2026-04-01T01:29:46Z

dbms/src/Functions/FunctionsJson.h

+                    const auto * key_nullmap = nullmaps[col];
+                    if (!sources[col] || (key_nullmap && (*key_nullmap)[i]))
+                        throw Exception("JSON documents may not contain NULL member names.");
+                }
+
+                assert(sources[col]);
+                const auto & key_from = sources[col]->getWhole();
+                if (unlikely(key_from.size > std::numeric_limits<UInt16>::max()))
+                    throw Exception("TiDB/TiFlash does not yet support JSON objects with the key length >= 65536");


⚠️ Potential issue | 🟡 Minor

Attach explicit error codes to the new json_object failure paths.

These branches currently throw bare Exception instances, so NULL-member-name and oversized-key failures lose their error classification at the SQL layer. Please keep them aligned with the rest of this file and throw typed DB::Exceptions instead.

Suggested fix

- throw Exception("JSON documents may not contain NULL member names."); + throw Exception( + ErrorCodes::ILLEGAL_COLUMN, + "JSON documents may not contain NULL member names."); ... - throw Exception("TiDB/TiFlash does not yet support JSON objects with the key length >= 65536"); + throw Exception( + ErrorCodes::ILLEGAL_COLUMN, + "TiDB/TiFlash does not yet support JSON objects with the key length >= 65536");

As per coding guidelines, "Use `DB::Exception` for error handling in C++" and "Prefer the fmt-style constructor for `DB::Exception` with error code first: `throw Exception(ErrorCodes::SOME_CODE, "Message with {}", arg);`"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@dbms/src/Functions/FunctionsJson.h` around lines 1108 - 1116, Replace the bare throws in the json_object path with DB::Exception using the file's existing ErrorCodes and fmt-style constructor: where you currently throw Exception("JSON documents may not contain NULL member names.") (around key_nullmap / sources[col]) throw DB::Exception(ErrorCodes::BAD_ARGUMENTS, "JSON documents may not contain NULL member names."); and where you throw Exception("TiDB/TiFlash does not yet support JSON objects with the key length >= 65536") (around key_from.size check) throw DB::Exception(ErrorCodes::TOO_LARGE_ARRAY_OR_STRING, "TiDB/TiFlash does not yet support JSON objects with the key length >= 65536"); ensure you include DB::Exception and ErrorCodes in the same style used elsewhere in FunctionsJson.h and keep the message text intact while passing the error code first.

coderabbitai · 2026-04-01T01:29:46Z

dbms/src/Interpreters/tests/gtest_join_null_eq.cpp

+void ensureFunctionsRegistered()
+{
+    static std::once_flag once;
+    std::call_once(once, [] {
+        try
+        {
+            registerFunctions();
+        }
+        catch (DB::Exception &)
+        {
+            // Another test suite may have already registered the functions.
+        }
+    });


⚠️ Potential issue | 🟡 Minor

Don’t swallow every registerFunctions() failure inside call_once.

Because the catch lives inside std::call_once, any real registerFunctions() failure also marks the once-flag as satisfied. That hides the setup error for the rest of the process and later tests fail in much less obvious places. Please only ignore the expected duplicate-registration case and rethrow everything else.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@dbms/src/Interpreters/tests/gtest_join_null_eq.cpp` around lines 58 - 70, ensureFunctionsRegistered currently swallows all DB::Exception inside the lambda passed to std::call_once which incorrectly marks the once flag even for real failures; change the code so the lambda only calls registerFunctions() (no catch) and move the try/catch to wrap the std::call_once call itself, catching DB::Exception and only suppressing the specific duplicate-registration error (identify it via its error code or message) while rethrowing any other DB::Exception; reference ensureFunctionsRegistered, std::call_once, std::once_flag, and registerFunctions when making this change.

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (1)

dbms/src/Interpreters/tests/gtest_join_null_eq.cpp (1)

58-71: ⚠️ Potential issue | 🟡 Minor

Exception handling inside std::call_once can hide real failures.

The try/catch inside the std::call_once lambda marks the once-flag as satisfied even for unexpected failures. Only expected duplicate-registration exceptions should be suppressed. Consider restructuring to rethrow non-duplicate-registration exceptions.

🛠️ Proposed fix

 void ensureFunctionsRegistered()
 {
     static std::once_flag once;
-    std::call_once(once, [] {
-        try
-        {
-            registerFunctions();
-        }
-        catch (DB::Exception &)
-        {
-            // Another test suite may have already registered the functions.
-        }
-    });
+    try
+    {
+        std::call_once(once, [] { registerFunctions(); });
+    }
+    catch (DB::Exception & e)
+    {
+        // Suppress expected duplicate registration; rethrow anything else
+        if (e.message().find("already registered") == String::npos)
+            throw;
+    }
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dbms/src/Interpreters/tests/gtest_join_null_eq.cpp` around lines 58 - 71, The
current ensureFunctionsRegistered uses std::call_once with a lambda that
swallows all DB::Exception, which marks the once flag as satisfied even for
unexpected failures; modify the lambda (or move the try/catch outside the
call_once) so you only suppress the known duplicate-registration error: catch
DB::Exception &e and check its identity (e.g. compare e.code() to the
duplicate-registration error code from ErrorCodes or check the exception message
text that indicates "already registered"), ignore only in that case and rethrow
for any other exception; reference ensureFunctionsRegistered, registerFunctions,
std::call_once, the once flag, and DB::Exception when making this change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dbms/src/Flash/tests/gtest_join_executor.cpp`:
- Around line 4264-4354: The new FullOuterJoinWithOtherCondition test cases
(TEST_F JoinExecutorTestRunner::FullOuterJoinWithOtherCondition and the adjacent
cases at lines ~4356-4410 and ~4412-4466) were added without the file's
WRAP_FOR_TEST_BEGIN/WRAP_FOR_TEST_END test harness, so they only run one
execution engine; wrap each test block (the TEST_F body that builds requests,
calls executeAndAssertColumnsEqual and ASSERT_COLUMNS_EQ_UR) with the
WRAP_FOR_TEST_BEGIN and WRAP_FOR_TEST_END macros (the same way existing join
tests do) so the cases run under both DataStream and Pipeline engines,
preserving the existing test structure around
context.scan(...).join(...).build(context), executeAndAssertColumnsEqual, and
the aggregation/ASSERT_COLUMNS_EQ_UR assertions.

In `@dbms/src/Functions/tests/gtest_json_object.cpp`:
- Line 15: The include <Columns/ColumnNullable.h> in
dbms/src/Functions/tests/gtest_json_object.cpp is unresolved and not used;
remove that include line (or if a symbol from ColumnNullable is actually needed,
replace it with the correct header path that defines that symbol). Update
gtest_json_object.cpp by deleting the `#include` <Columns/ColumnNullable.h> (or
swap to the correct header) so the compilation error from the missing header is
resolved.

In `@dbms/src/TestUtils/tests/gtest_mock_executors.cpp`:
- Around line 344-372: The test currently calls
collectOutputFieldTypes(*request) on the raw join built by
context.scan(...).join(...).build(context), so it never exercises column
pruning; modify the test to wrap the join request in a parent projection (or
equivalent pruning operator) that selects a subset of output columns (e.g., only
the join outputs you want to keep) before calling build(context), then call
collectOutputFieldTypes on that pruned request to assert the join executor’s
field types post-prune (keep using MockDAGRequestTest, collectOutputFieldTypes,
context.scan, join, and build to locate the code and reuse expected_size
assertions).
- Line 15: The test includes a missing header for collectOutputFieldTypes
causing compilation failure; update gtest_mock_executors.cpp to include the
correct public header that declares collectOutputFieldTypes (or adjust the
include path) so the symbol is resolved — locate usages of
collectOutputFieldTypes in this test and replace or add the proper include (the
header that actually exposes the collectOutputFieldTypes declaration) rather
than the non-existent Flash/Coprocessor/collectOutputFieldTypes.h.

---

Duplicate comments:
In `@dbms/src/Interpreters/tests/gtest_join_null_eq.cpp`:
- Around line 58-71: The current ensureFunctionsRegistered uses std::call_once
with a lambda that swallows all DB::Exception, which marks the once flag as
satisfied even for unexpected failures; modify the lambda (or move the try/catch
outside the call_once) so you only suppress the known duplicate-registration
error: catch DB::Exception &e and check its identity (e.g. compare e.code() to
the duplicate-registration error code from ErrorCodes or check the exception
message text that indicates "already registered"), ignore only in that case and
rethrow for any other exception; reference ensureFunctionsRegistered,
registerFunctions, std::call_once, the once flag, and DB::Exception when making
this change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a757c79f-4497-47c9-8eaa-8ae661954a82

📥 Commits

Reviewing files that changed from the base of the PR and between a0f963c and 8e48e45.

📒 Files selected for processing (33)

dbms/src/Debug/MockExecutor/JoinBinder.cpp
dbms/src/Debug/MockExecutor/JoinBinder.h
dbms/src/Flash/Coprocessor/DAGUtils.cpp
dbms/src/Flash/Coprocessor/JoinInterpreterHelper.cpp
dbms/src/Flash/Coprocessor/JoinInterpreterHelper.h
dbms/src/Flash/Coprocessor/tests/gtest_join_get_kind_and_build_index.cpp
dbms/src/Flash/Planner/Plans/PhysicalJoin.cpp
dbms/src/Flash/tests/gtest_join_executor.cpp
dbms/src/Flash/tests/gtest_spill_join.cpp
dbms/src/Functions/FunctionsJson.cpp
dbms/src/Functions/FunctionsJson.h
dbms/src/Functions/tests/gtest_json_object.cpp
dbms/src/Interpreters/CrossJoinProbeHelper.cpp
dbms/src/Interpreters/Join.cpp
dbms/src/Interpreters/Join.h
dbms/src/Interpreters/JoinHashMap.cpp
dbms/src/Interpreters/JoinHashMap.h
dbms/src/Interpreters/JoinPartition.cpp
dbms/src/Interpreters/JoinPartition.h
dbms/src/Interpreters/JoinUtils.cpp
dbms/src/Interpreters/JoinUtils.h
dbms/src/Interpreters/NullAwareSemiJoinHelper.h
dbms/src/Interpreters/ProbeProcessInfo.cpp
dbms/src/Interpreters/ProbeProcessInfo.h
dbms/src/Interpreters/tests/gtest_join_null_eq.cpp
dbms/src/TestUtils/ColumnsToTiPBExpr.h
dbms/src/TestUtils/mockExecutor.cpp
dbms/src/TestUtils/mockExecutor.h
dbms/src/TestUtils/tests/gtest_mock_executors.cpp
dbms/src/TiDB/Decode/JsonBinary.cpp
dbms/src/TiDB/Decode/JsonBinary.h
docs/note/nulleq_join.md
tests/fullstack-test/expr/json_object.test

✅ Files skipped from review due to trivial changes (5)

dbms/src/Functions/FunctionsJson.cpp
dbms/src/TestUtils/ColumnsToTiPBExpr.h
tests/fullstack-test/expr/json_object.test
dbms/src/Interpreters/ProbeProcessInfo.h
dbms/src/Functions/FunctionsJson.h

🚧 Files skipped from review as they are similar to previous changes (15)

dbms/src/Flash/Coprocessor/DAGUtils.cpp
dbms/src/TestUtils/mockExecutor.h
dbms/src/TiDB/Decode/JsonBinary.h
dbms/src/TestUtils/mockExecutor.cpp
dbms/src/Interpreters/JoinPartition.h
dbms/src/Debug/MockExecutor/JoinBinder.h
dbms/src/Flash/tests/gtest_spill_join.cpp
dbms/src/TiDB/Decode/JsonBinary.cpp
dbms/src/Interpreters/NullAwareSemiJoinHelper.h
dbms/src/Interpreters/JoinUtils.cpp
dbms/src/Interpreters/JoinUtils.h
dbms/src/Flash/Coprocessor/JoinInterpreterHelper.h
dbms/src/Flash/Coprocessor/JoinInterpreterHelper.cpp
dbms/src/Interpreters/Join.cpp
dbms/src/Interpreters/JoinPartition.cpp

coderabbitai · 2026-04-01T02:59:31Z

dbms/src/Flash/tests/gtest_join_executor.cpp

+TEST_F(JoinExecutorTestRunner, FullOuterJoinWithOtherCondition)
+try
+{
+    using tipb::JoinType;
+
+    const std::vector<std::tuple<ColumnsWithTypeAndName, ColumnsWithTypeAndName, ColumnsWithTypeAndName, String>>
+        test_cases
+        = {{
+               {toNullableVec<Int32>("a", {1}), toNullableVec<Int32>("c", {10})},
+               {toNullableVec<Int32>("a", {1}), toNullableVec<Int32>("c", {5})},
+               {toNullableVec<Int32>("a", {1, {}}),
+                toNullableVec<Int32>("c", {10, {}}),
+                toNullableVec<Int32>("a", {{}, 1}),
+                toNullableVec<Int32>("c", {{}, 5})},
+               "key matched but other condition failed for all rows",
+           },
+           {
+               {toNullableVec<Int32>("a", {2}), toNullableVec<Int32>("c", {20})},
+               {toNullableVec<Int32>("a", {1}), toNullableVec<Int32>("c", {5})},
+               {toNullableVec<Int32>("a", {2, {}}),
+                toNullableVec<Int32>("c", {20, {}}),
+                toNullableVec<Int32>("a", {{}, 1}),
+                toNullableVec<Int32>("c", {{}, 5})},
+               "key not matched",
+           },
+           {
+               {toNullableVec<Int32>("a", {{}}), toNullableVec<Int32>("c", {10})},
+               {toNullableVec<Int32>("a", {1}), toNullableVec<Int32>("c", {5})},
+               {toNullableVec<Int32>("a", {{}, {}}),
+                toNullableVec<Int32>("c", {10, {}}),
+                toNullableVec<Int32>("a", {{}, 1}),
+                toNullableVec<Int32>("c", {{}, 5})},
+               "probe-side null key should still be emitted as unmatched row",
+           },
+           {
+               {toNullableVec<Int32>("a", {1}), toNullableVec<Int32>("c", {3})},
+               {toNullableVec<Int32>("a", {1, 1}), toNullableVec<Int32>("c", {2, 4})},
+               {toNullableVec<Int32>("a", {1, {}}),
+                toNullableVec<Int32>("c", {3, {}}),
+                toNullableVec<Int32>("a", {1, 1}),
+                toNullableVec<Int32>("c", {4, 2})},
+               "only rows that pass other condition should be marked used",
+           }};
+
+    for (const auto & [left, right, res, test_case_name] : test_cases)
+    {
+        SCOPED_TRACE(test_case_name);
+        context.addMockTable(
+            "full_outer_other_condition",
+            "t",
+            {{"a", TiDB::TP::TypeLong}, {"c", TiDB::TP::TypeLong}},
+            left);
+        context.addMockTable(
+            "full_outer_other_condition",
+            "s",
+            {{"a", TiDB::TP::TypeLong}, {"c", TiDB::TP::TypeLong}},
+            right);
+
+        auto request = context.scan("full_outer_other_condition", "t")
+                           .join(
+                               context.scan("full_outer_other_condition", "s"),
+                               JoinType::TypeFullOuterJoin,
+                               {col("a")},
+                               {},
+                               {},
+                               {lt(col("t.c"), col("s.c"))},
+                               {},
+                               0,
+                               false,
+                               1)
+                           .build(context);
+        executeAndAssertColumnsEqual(request, res);
+
+        auto request_column_prune = context.scan("full_outer_other_condition", "t")
+                                        .join(
+                                            context.scan("full_outer_other_condition", "s"),
+                                            JoinType::TypeFullOuterJoin,
+                                            {col("a")},
+                                            {},
+                                            {},
+                                            {lt(col("t.c"), col("s.c"))},
+                                            {},
+                                            0,
+                                            false,
+                                            1)
+                                        .aggregation({Count(lit(static_cast<UInt64>(1)))}, {})
+                                        .build(context);
+        ASSERT_COLUMNS_EQ_UR(genScalarCountResults(res), executeStreams(request_column_prune, 2));
+    }
+}
+CATCH


⚠️ Potential issue | 🟠 Major

Run these full-outer executor cases under both execution modes.

These additions bypass the file’s existing WRAP_FOR_JOIN_TEST_BEGIN/END helper, so they only validate one engine. Since this PR is changing join execution semantics, that leaves the other engine untested for exactly the scenarios this suite is meant to lock down. Based on learnings: Wrap executor tests in WRAP_FOR_TEST_BEGIN and WRAP_FOR_TEST_END macros to ensure test coverage for both DataStream and Pipeline execution engines.

Also applies to: 4356-4410, 4412-4466

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@dbms/src/Flash/tests/gtest_join_executor.cpp` around lines 4264 - 4354, The new FullOuterJoinWithOtherCondition test cases (TEST_F JoinExecutorTestRunner::FullOuterJoinWithOtherCondition and the adjacent cases at lines ~4356-4410 and ~4412-4466) were added without the file's WRAP_FOR_TEST_BEGIN/WRAP_FOR_TEST_END test harness, so they only run one execution engine; wrap each test block (the TEST_F body that builds requests, calls executeAndAssertColumnsEqual and ASSERT_COLUMNS_EQ_UR) with the WRAP_FOR_TEST_BEGIN and WRAP_FOR_TEST_END macros (the same way existing join tests do) so the cases run under both DataStream and Pipeline engines, preserving the existing test structure around context.scan(...).join(...).build(context), executeAndAssertColumnsEqual, and the aggregation/ASSERT_COLUMNS_EQ_UR assertions.

coderabbitai · 2026-04-01T02:59:31Z

dbms/src/Functions/tests/gtest_json_object.cpp

+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <Columns/ColumnNullable.h>


⚠️ Potential issue | 🔴 Critical

Remove or fix unresolved header include to avoid build break

<Columns/ColumnNullable.h> is reported as not found by clang, and this file does not directly use symbols from it. This can fail compilation in CI; drop the include (or replace with the correct header path if genuinely required).

Proposed fix

-#include <Columns/ColumnNullable.h> `#include` <TestUtils/FunctionTestUtils.h> `#include` <TestUtils/TiFlashTestBasic.h> `#include` <TiDB/Schema/TiDBTypes.h> `#include` <gtest/gtest.h>

🧰 Tools

🪛 Clang (14.0.6)

[error] 15-15: 'Columns/ColumnNullable.h' file not found

(clang-diagnostic-error)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@dbms/src/Functions/tests/gtest_json_object.cpp` at line 15, The include <Columns/ColumnNullable.h> in dbms/src/Functions/tests/gtest_json_object.cpp is unresolved and not used; remove that include line (or if a symbol from ColumnNullable is actually needed, replace it with the correct header path that defines that symbol). Update gtest_json_object.cpp by deleting the `#include` <Columns/ColumnNullable.h> (or swap to the correct header) so the compilation error from the missing header is resolved.

coderabbitai · 2026-04-01T02:59:31Z

dbms/src/TestUtils/tests/gtest_mock_executors.cpp

 // See the License for the specific language governing permissions and
 // limitations under the License.

+#include <Flash/Coprocessor/collectOutputFieldTypes.h>


⚠️ Potential issue | 🔴 Critical

Fix the unresolved collectOutputFieldTypes include.

Clang is already reporting Flash/Coprocessor/collectOutputFieldTypes.h as missing, so this test file will not compile until the declaration is exposed through an existing public header or the include path is corrected.

🧰 Tools

🪛 Clang (14.0.6)

[error] 15-15: 'Flash/Coprocessor/collectOutputFieldTypes.h' file not found

(clang-diagnostic-error)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@dbms/src/TestUtils/tests/gtest_mock_executors.cpp` at line 15, The test includes a missing header for collectOutputFieldTypes causing compilation failure; update gtest_mock_executors.cpp to include the correct public header that declares collectOutputFieldTypes (or adjust the include path) so the symbol is resolved — locate usages of collectOutputFieldTypes in this test and replace or add the proper include (the header that actually exposes the collectOutputFieldTypes declaration) rather than the non-existent Flash/Coprocessor/collectOutputFieldTypes.h.

coderabbitai · 2026-04-01T02:59:31Z

dbms/src/TestUtils/tests/gtest_mock_executors.cpp

+TEST_F(MockDAGRequestTest, SemiJoinColumnPruneKeepsJoinOutputSchema)
+try
+{
+    const std::vector<std::pair<tipb::JoinType, size_t>> test_cases = {
+        {tipb::JoinType::TypeSemiJoin, 3},
+        {tipb::JoinType::TypeAntiSemiJoin, 3},
+        {tipb::JoinType::TypeLeftOuterSemiJoin, 4},
+        {tipb::JoinType::TypeAntiLeftOuterSemiJoin, 4},
+    };
+
+    for (const auto & [join_type, expected_size] : test_cases)
+    {
+        auto request = context.scan("test_db", "l_table")
+                           .join(context.scan("test_db", "r_table"), join_type, {col("join_c")})
+                           .build(context);
+
+        auto output_field_types = collectOutputFieldTypes(*request);
+        ASSERT_EQ(output_field_types.size(), expected_size) << fmt::underlying(join_type);
+        ASSERT_EQ(output_field_types[0].tp(), TiDB::TypeLong);
+        ASSERT_EQ(output_field_types[1].tp(), TiDB::TypeString);
+        ASSERT_EQ(output_field_types[2].tp(), TiDB::TypeString);
+
+        if (expected_size == 4)
+        {
+            ASSERT_EQ(output_field_types[3].tp(), TiDB::TypeTiny);
+            ASSERT_EQ(output_field_types[3].flag() & TiDB::ColumnFlagNotNull, 0);
+        }
+    }
+}


⚠️ Potential issue | 🟠 Major

This test never exercises column pruning.

collectOutputFieldTypes(*request) is being called on a bare join, so it only checks the default join output schema. To cover the regression implied by the test name, add a parent that actually prunes join outputs and then assert the join executor’s field types after that prune step.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@dbms/src/TestUtils/tests/gtest_mock_executors.cpp` around lines 344 - 372, The test currently calls collectOutputFieldTypes(*request) on the raw join built by context.scan(...).join(...).build(context), so it never exercises column pruning; modify the test to wrap the join request in a parent projection (or equivalent pruning operator) that selects a subset of output columns (e.g., only the join outputs you want to keep) before calling build(context), then call collectOutputFieldTypes on that pruned request to assert the join executor’s field types post-prune (keep using MockDAGRequestTest, collectOutputFieldTypes, context.scan, join, and build to locate the code and reuse expected_size assertions).

windtalker added 12 commits March 27, 2026 14:55

protocal support for full outer join

84ead13

Signed-off-by: xufei <xufeixw@mail.ustc.edu.cn>

add doc

577d5c0

Signed-off-by: xufei <xufeixw@mail.ustc.edu.cn>

guard cartesian full outer join

7a2a79c

add local gtests build notes

15b7ab1

make full join schemas nullable

93954c2

allow full join non-equal conditions

06dbc7b

clarify full join other-condition rollout

6581648

fix full join other-condition execution

f7c2093

test full join targeted coverage

7e5e0d0

Fix full outer join review comments

cde4613

update tipb

1590cf3

Signed-off-by: xufei <xufeixw@mail.ustc.edu.cn>

Remove AGENTS.md from full outer join PR

1f30115

ti-chi-bot bot added the release-note-none Denotes a PR that doesn't merit a release note. label Apr 1, 2026

ti-chi-bot bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 1, 2026

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

windtalker added 13 commits April 1, 2026 10:37

docs add nulleq join design note

bbe82d7

plumb join is_null_eq into DB::Join

d58fde6

support json_object pushdown

f82ad38

disable json_object explain test temporarily

8b5499c

fix json_object code format

71b80dd

support nullable nulleq join keys in hash join

3eb2eb1

refine row filter handling for nulleq join

1316d57

add outer join tests for nulleq scan-after-probe

239bfef

test: cover full join other condition for nulleq

64df247

fix: disable runtime filter for nullable nulleq keys

074a670

test: extend nulleq join coverage

877b54a

test: add null-eq join regression coverage

43ace95

docs: update null-eq join progress

6bccb3c

windtalker added 2 commits April 1, 2026 10:37

feat: optimize null-eq join fixed keys

ba30d61

test: cover null-eq join key paths

8e48e45

windtalker force-pushed the support_fullouter_join branch from a0f963c to 8e48e45 Compare April 1, 2026 02:40

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

Conversation

windtalker commented Apr 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What is changed and how it works?

Check List

Release note

Summary by CodeRabbit

Uh oh!

ti-chi-bot bot commented Apr 1, 2026

Uh oh!

coderabbitai bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

windtalker commented Apr 1, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 1, 2026 •

edited

Loading