Skip to content

[DNM] support fullouter join and nulleq join#10783

Open
windtalker wants to merge 27 commits intopingcap:feature/release-8.5-materialized-viewfrom
windtalker:support_fullouter_join
Open

[DNM] support fullouter join and nulleq join#10783
windtalker wants to merge 27 commits intopingcap:feature/release-8.5-materialized-viewfrom
windtalker:support_fullouter_join

Conversation

@windtalker
Copy link
Copy Markdown
Contributor

@windtalker windtalker commented Apr 1, 2026

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

What is changed and how it works?


Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Summary by CodeRabbit

  • New Features

    • Added support for FULL OUTER JOIN execution with proper output schema nullability and non-equality conditions.
    • Implemented NULL-safe equality (<=>) operator support for join keys, allowing NULL values to participate in key matching.
    • Added json_object() SQL function for constructing JSON objects.
  • Bug Fixes

    • Fixed join execution with nullable key columns across hash map selection and probe phases.
    • Improved runtime filter handling for joins with null-aware key semantics.

@ti-chi-bot ti-chi-bot bot added the release-note-none Denotes a PR that doesn't merit a release note. label Apr 1, 2026
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Apr 1, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign ywqzzy for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 1, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 1, 2026

📝 Walkthrough

Walkthrough

This PR adds support for full outer join and NULL-safe equality (NullEQ) in TiFlash's hash join implementation. It threads an is_null_eq flag vector through join planning and execution, introduces nullable key hash map methods, refactors null/filter map handling, implements the json_object() function with binary JSON object building, and extends the test infrastructure with comprehensive coverage for new join semantics.

Changes

Cohort / File(s) Summary
Core Join Execution Infrastructure
dbms/src/Interpreters/Join.cpp, dbms/src/Interpreters/Join.h
Extended Join constructor to accept is_null_eq vector; updated key column preparation to preserve ColumnNullable wrappers for NullEQ keys; refactored null-map extraction to produce unified row_filter_map distinguishing ordinary = key NULLs from NullEQ key NULLs; added special Full join handling with row-flagged hash map logic and kept-row tracking.
Hash Map Methods & Selection
dbms/src/Interpreters/JoinHashMap.cpp, dbms/src/Interpreters/JoinHashMap.h
Added nullable_keys128 and nullable_keys256 join map methods; updated chooseJoinMapMethod to accept is_null_eq and select nullable-specific methods when nullable NullEQ keys exist; unwraps nullable columns for type-dispatch logic.
Partition-Level Join Operations
dbms/src/Interpreters/JoinPartition.cpp, dbms/src/Interpreters/JoinPartition.h
Renamed null_maprow_filter_map throughout partition building/probing; added KeyGetterForTypeImpl specializations for nullable key methods; extended RowFlaggedHashMapAdder with addNotFoundForFull for full-join specific materialization; updated dispatch logic for full-join row-flagged map handling.
Probe Phase Filtering
dbms/src/Interpreters/ProbeProcessInfo.cpp, dbms/src/Interpreters/ProbeProcessInfo.h, dbms/src/Interpreters/JoinUtils.cpp, dbms/src/Interpreters/JoinUtils.h
Added extractJoinKeyColumnsAndFilterNullMap to derive per-key null-equality aware filter maps; refactored ProbeProcessInfo to use row_filter_map instead of null_map; extended prepareForHashProbe signature with is_null_eq parameter; updated isNecessaryKindToUseRowFlaggedHashMap to include Full joins.
Join Type & Build-Side Mapping
dbms/src/Flash/Coprocessor/JoinInterpreterHelper.cpp, dbms/src/Flash/Coprocessor/JoinInterpreterHelper.h
Added getJoinKeyNullEqFlags to extract per-key null-eq flags from tipb::Join; introduced makeLeftJoinSideNullable / makeRightJoinSideNullable helpers; added alignNullEqKeyTypes to reconcile nullable key types between probe/build sides; extended validation to reject TypeFullOuterJoin without join keys and enforce null-eq/null-aware-semi mutual exclusion; added shouldDisableRuntimeFilter to gate runtime filters based on nullable NullEQ keys.
Cross-Join & Null-Aware Semi-Join Plumbing
dbms/src/Interpreters/CrossJoinProbeHelper.cpp, dbms/src/Interpreters/NullAwareSemiJoinHelper.h
Replaced null_map-based filtering with row_filter_map in cross-join probe paths; renamed NALeftSideInfo constructor parameters null_map_key_null_map_ and filter_map_row_filter_map_ to clarify semantics.
Physical Plan Building
dbms/src/Flash/Planner/Plans/PhysicalJoin.cpp, dbms/src/Flash/Coprocessor/collectOutputFieldTypes.cpp
Integrated alignNullEqKeyTypes call after key preparation; added conditional runtime-filter construction via shouldDisableRuntimeFilter; updated output nullability logic to use makeLeftJoinSideNullable / makeRightJoinSideNullable helpers.
DAG Compilation & Mock Executor
dbms/src/Flash/Coprocessor/DAGUtils.cpp, dbms/src/Debug/MockExecutor/JoinBinder.cpp, dbms/src/Debug/MockExecutor/JoinBinder.h, dbms/src/TestUtils/mockExecutor.cpp, dbms/src/TestUtils/mockExecutor.h
Enabled JsonObjectSig mapping in scalar function table; added getJoinTypeName support for TypeFullOuterJoin; extended compileJoin signature with is_null_eq vector; updated DAGRequestBuilder::join to thread is_null_eq through mock compilation.
Nullable Key Type Changes
dbms/src/DataStreams/ScanHashMapAfterProbeBlockInputStream.cpp
Added handling for Full join kind in dispatch loop, selecting fillColumnsUsingCurrentPartition variants based on has_other_condition flag.
JSON Object Function
dbms/src/Functions/FunctionsJson.h, dbms/src/Functions/FunctionsJson.cpp
Implemented FunctionJsonObject supporting variadic key/value pairs, nullable value handling, and validation that keys are non-NULL strings; registered function in factory.
JSON Binary Encoding
dbms/src/TiDB/Decode/JsonBinary.cpp, dbms/src/TiDB/Decode/JsonBinary.h
Added buildBinaryJsonObjectInBuffer to construct binary-encoded JSON objects from key/value vectors, including size validation, offset computation, and depth tracking.
Test Utilities & Headers
dbms/src/TestUtils/ColumnsToTiPBExpr.h
Added #include <tipb/expression.pb.h> to support direct tipb::Expr references.
Core Join Semantics Tests
dbms/src/Interpreters/tests/gtest_join_null_eq.cpp
Comprehensive test suite for NULL-equality join semantics: map-method selection with nullable keys, build-side NULL insertion, per-key null-eq behavior across inner/left/right/full/semi/anti join kinds, mixed-key scenarios, and full/right-outer with other conditions.
High-Level Join Tests
dbms/src/Flash/Coprocessor/tests/gtest_join_get_kind_and_build_index.cpp, dbms/src/Flash/tests/gtest_join_executor.cpp, dbms/src/Flash/tests/gtest_spill_join.cpp, dbms/src/TestUtils/tests/gtest_mock_executors.cpp
Extended existing test fixtures with full-outer-join cases (with/without other conditions), null-eq key alignment, runtime-filter disabling behavior, full join Cartesian rejection, output/filter nullability validation, and column-pruning correctness; added JSON object serialization and full-outer schema nullability tests; introduced spill variants with null-eq mock sources and fine-grained shuffle join tests.
JSON Object Function Tests
dbms/src/Functions/tests/gtest_json_object.cpp
Unit test suite for json_object covering empty arguments, key/value assembly, duplicate keys, NULL values as JSON nulls, and NULL key rejection.
SQL Fullstack Test
tests/fullstack-test/expr/json_object.test
End-to-end SQL test for json_object() function across MPP mode with various input patterns and NULL handling.
Design Documentation
docs/note/fullouter_join.md, docs/note/nulleq_join.md
Comprehensive design documents defining full-outer-join scope, correctness issues, and implementation steps; detailed NullEQ semantics, implementation assumptions, and test milestones (in English and Chinese respectively).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

Possibly related PRs

Suggested labels

approved, lgtm

Suggested reviewers

  • gengliqi
  • JinheLin

Poem

🐰 Whiskers twitching with delight,
Full joins now dance left-outer-right!
NullEQ keys in harmony blend,
JSON objects build {} end-to-end—
Hops of logic, perfectly tight! 🌿

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is largely incomplete, containing only placeholder template text with no actual problem statement, implementation details, test documentation, or release notes filled in. Complete the PR description by filling in the actual problem statement, explaining the changes made, documenting which tests were added or run, and providing a meaningful release note.
Docstring Coverage ⚠️ Warning Docstring coverage is 6.01% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title '[DNM] support fullouter join and nulleq join' clearly summarizes the main changes: adding support for full outer joins and null-equal semantics in join operations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
dbms/src/Debug/MockExecutor/JoinBinder.cpp (1)

172-191: ⚠️ Potential issue | 🟡 Minor

Fail fast on is_null_eq / join-key count mismatches even with assertions off.

assert(...) disappears in release builds. If a caller passes a flag vector with the wrong length, this path will quietly serialize a malformed tipb::Join instead of failing at the source. Please turn this into a runtime check before populating join->is_null_eq.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbms/src/Debug/MockExecutor/JoinBinder.cpp` around lines 172 - 191, Replace
the assert(...) with a runtime validation that fails immediately if is_null_eq
is non-empty and its size doesn't equal join_cols.size(); e.g., before the loops
that call fillJoinKeyAndFieldType and before adding is_null_eq entries, add a
check like if (!is_null_eq.empty() && is_null_eq.size() != join_cols.size()) and
throw or return an error (std::invalid_argument or similar) so callers cannot
produce a malformed tipb::Join when assertions are disabled; keep the rest of
the logic (fillJoinKeyAndFieldType and join->add_is_null_eq) unchanged.
dbms/src/Interpreters/JoinPartition.cpp (1)

1546-1551: ⚠️ Potential issue | 🔴 Critical

Preserve filtered probe rows in FULL joins.

In this branch row_flagged_map always calls addNotFound(), which emits zero rows. For ASTTableJoin::Kind::Full, rows skipped by row_filter_map are still on the preserved side of the join (for example ordinary = keys containing NULL, or probe-side ON filters), so they must produce one NULL-extended output row. The later addNotFoundForFull() only fixes true hash misses, so full joins with other conditions still drop these probe rows.

💡 Suggested fix
         if (has_row_filter_map && (*row_filter_map)[i])
         {
             if constexpr (row_flagged_map)
             {
-                block_full = RowFlaggedHashMapAdder<Map>::addNotFound(i, current_offset, offsets_to_replicate.get());
+                if constexpr (KIND == ASTTableJoin::Kind::Full)
+                {
+                    block_full = RowFlaggedHashMapAdder<Map>::addNotFoundForFull(
+                        num_columns_to_add,
+                        added_columns,
+                        i,
+                        current_offset,
+                        offsets_to_replicate.get(),
+                        probe_process_info);
+                }
+                else
+                {
+                    block_full
+                        = RowFlaggedHashMapAdder<Map>::addNotFound(i, current_offset, offsets_to_replicate.get());
+                }
             }
             /// RightSemi/RightAnti without other conditions, just ignore not matched probe rows
             else if constexpr (KIND != ASTTableJoin::Kind::RightSemi && KIND != ASTTableJoin::Kind::RightAnti)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbms/src/Interpreters/JoinPartition.cpp` around lines 1546 - 1551, The branch
handling has_row_filter_map currently calls
RowFlaggedHashMapAdder<Map>::addNotFound(), which emits zero rows and thus drops
probe rows that must be preserved for ASTTableJoin::Kind::Full; change the call
in JoinPartition.cpp (the branch using row_flagged_map, current_offset,
offsets_to_replicate.get(), i) to call
RowFlaggedHashMapAdder<Map>::addNotFoundForFull(...) when the join kind is
ASTTableJoin::Kind::Full (or otherwise ensure the Full-join path uses
addNotFoundForFull), so filtered probe rows produce a single NULL-extended
output row matching how true hash misses are handled; keep existing parameters
and flow for non-Full joins.
🧹 Nitpick comments (2)
dbms/src/Flash/tests/gtest_join_executor.cpp (1)

4264-4458: Run these new executor tests under both DataStream and Pipeline modes.

These additions are currently not wrapped by the join test mode macro, so they may miss regressions that are engine-specific.

♻️ Suggested pattern
 TEST_F(JoinExecutorTestRunner, FullOuterJoinWithOtherCondition)
 try
 {
+    WRAP_FOR_JOIN_TEST_BEGIN
     ...
+    WRAP_FOR_JOIN_TEST_END
 }
 CATCH

Apply the same wrapping to the other two new tests in this block.

Based on learnings: Wrap executor tests in WRAP_FOR_TEST_BEGIN and WRAP_FOR_TEST_END macros to ensure test coverage for both DataStream and Pipeline execution engines.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbms/src/Flash/tests/gtest_join_executor.cpp` around lines 4264 - 4458, The
new full-outer-join tests (TEST_F(JoinExecutorTestRunner,
FullOuterJoinWithOtherCondition), TEST_F(JoinExecutorTestRunner,
FullOuterJoinWithoutOtherConditionAndNullKey), and
TEST_F(JoinExecutorTestRunner, FullOuterJoinWithLeftAndRightConditions)) must be
executed under both DataStream and Pipeline modes; wrap each TEST_F block with
the engine-test macros by inserting WRAP_FOR_TEST_BEGIN before the test body and
WRAP_FOR_TEST_END after the CATCH so each test is enclosed by
WRAP_FOR_TEST_BEGIN(...) / WRAP_FOR_TEST_END(...) (using the same pattern used
by other join tests) to ensure both execution engines run these cases.
dbms/src/Interpreters/tests/gtest_join_null_eq.cpp (1)

323-410: Prefer the shared test column builders instead of reimplementing them here.

This block recreates nullable number/string/fixed-string builders plus simple numeric columns. Reusing createColumn<T>-style helpers would cut a lot of boilerplate and keep null / fixed-string encoding consistent with the rest of the test suite.

Based on learnings, "Use createColumn<T> helpers from FunctionTestUtils.h to build test data instead of manual construction".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbms/src/Interpreters/tests/gtest_join_null_eq.cpp` around lines 323 - 410,
The test reimplements column builders (makeNullableNumberColumn,
makeNullableInt32Column, makeNullableInt64Column, makeNullableStringColumn,
makeNullableFixedStringColumn, makeUInt8Column) instead of using the shared
helpers; replace these manual builders with the
createColumn<T>/createNullableColumn<T>/createFixedStringColumn helpers from
FunctionTestUtils.h (import that header), constructing the same values via those
utilities so null maps and fixed-string padding follow the shared test-suite
conventions, and remove the duplicated manual implementations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dbms/src/Flash/Coprocessor/JoinInterpreterHelper.h`:
- Around line 122-130: The current makeLeftJoinSideNullable and
makeRightJoinSideNullable compute nullability from tipb::JoinType which ignores
the normalized preserved/nullable sides returned by
getJoinKindAndBuildSideIndex; update these helpers to accept and use the
resolved join kind (e.g., ASTTableJoin::Kind) or the full tipb::Join (including
inner_idx) and determine nullability from that normalized result so the
preserved side logic is correct when inner_idx swaps sides; locate and change
the signatures of makeLeftJoinSideNullable/makeRightJoinSideNullable and any
callers to pass the normalized kind (or tipb::Join) and implement the boolean
checks against ASTTableJoin::Kind (or use inner_idx-aware logic) instead of raw
tipb::JoinType.

In `@dbms/src/Functions/FunctionsJson.h`:
- Around line 1108-1116: Replace the bare throws in the json_object path with
DB::Exception using the file's existing ErrorCodes and fmt-style constructor:
where you currently throw Exception("JSON documents may not contain NULL member
names.") (around key_nullmap / sources[col]) throw
DB::Exception(ErrorCodes::BAD_ARGUMENTS, "JSON documents may not contain NULL
member names."); and where you throw Exception("TiDB/TiFlash does not yet
support JSON objects with the key length >= 65536") (around key_from.size check)
throw DB::Exception(ErrorCodes::TOO_LARGE_ARRAY_OR_STRING, "TiDB/TiFlash does
not yet support JSON objects with the key length >= 65536"); ensure you include
DB::Exception and ErrorCodes in the same style used elsewhere in FunctionsJson.h
and keep the message text intact while passing the error code first.

In `@dbms/src/Interpreters/tests/gtest_join_null_eq.cpp`:
- Around line 58-70: ensureFunctionsRegistered currently swallows all
DB::Exception inside the lambda passed to std::call_once which incorrectly marks
the once flag even for real failures; change the code so the lambda only calls
registerFunctions() (no catch) and move the try/catch to wrap the std::call_once
call itself, catching DB::Exception and only suppressing the specific
duplicate-registration error (identify it via its error code or message) while
rethrowing any other DB::Exception; reference ensureFunctionsRegistered,
std::call_once, std::once_flag, and registerFunctions when making this change.

---

Outside diff comments:
In `@dbms/src/Debug/MockExecutor/JoinBinder.cpp`:
- Around line 172-191: Replace the assert(...) with a runtime validation that
fails immediately if is_null_eq is non-empty and its size doesn't equal
join_cols.size(); e.g., before the loops that call fillJoinKeyAndFieldType and
before adding is_null_eq entries, add a check like if (!is_null_eq.empty() &&
is_null_eq.size() != join_cols.size()) and throw or return an error
(std::invalid_argument or similar) so callers cannot produce a malformed
tipb::Join when assertions are disabled; keep the rest of the logic
(fillJoinKeyAndFieldType and join->add_is_null_eq) unchanged.

In `@dbms/src/Interpreters/JoinPartition.cpp`:
- Around line 1546-1551: The branch handling has_row_filter_map currently calls
RowFlaggedHashMapAdder<Map>::addNotFound(), which emits zero rows and thus drops
probe rows that must be preserved for ASTTableJoin::Kind::Full; change the call
in JoinPartition.cpp (the branch using row_flagged_map, current_offset,
offsets_to_replicate.get(), i) to call
RowFlaggedHashMapAdder<Map>::addNotFoundForFull(...) when the join kind is
ASTTableJoin::Kind::Full (or otherwise ensure the Full-join path uses
addNotFoundForFull), so filtered probe rows produce a single NULL-extended
output row matching how true hash misses are handled; keep existing parameters
and flow for non-Full joins.

---

Nitpick comments:
In `@dbms/src/Flash/tests/gtest_join_executor.cpp`:
- Around line 4264-4458: The new full-outer-join tests
(TEST_F(JoinExecutorTestRunner, FullOuterJoinWithOtherCondition),
TEST_F(JoinExecutorTestRunner, FullOuterJoinWithoutOtherConditionAndNullKey),
and TEST_F(JoinExecutorTestRunner, FullOuterJoinWithLeftAndRightConditions))
must be executed under both DataStream and Pipeline modes; wrap each TEST_F
block with the engine-test macros by inserting WRAP_FOR_TEST_BEGIN before the
test body and WRAP_FOR_TEST_END after the CATCH so each test is enclosed by
WRAP_FOR_TEST_BEGIN(...) / WRAP_FOR_TEST_END(...) (using the same pattern used
by other join tests) to ensure both execution engines run these cases.

In `@dbms/src/Interpreters/tests/gtest_join_null_eq.cpp`:
- Around line 323-410: The test reimplements column builders
(makeNullableNumberColumn, makeNullableInt32Column, makeNullableInt64Column,
makeNullableStringColumn, makeNullableFixedStringColumn, makeUInt8Column)
instead of using the shared helpers; replace these manual builders with the
createColumn<T>/createNullableColumn<T>/createFixedStringColumn helpers from
FunctionTestUtils.h (import that header), constructing the same values via those
utilities so null maps and fixed-string padding follow the shared test-suite
conventions, and remove the duplicated manual implementations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: de687f35-0578-486e-9fb4-269fc33b74f3

📥 Commits

Reviewing files that changed from the base of the PR and between 1c34cbe and a0f963c.

📒 Files selected for processing (37)
  • AGENTS.md
  • dbms/src/DataStreams/ScanHashMapAfterProbeBlockInputStream.cpp
  • dbms/src/Debug/MockExecutor/JoinBinder.cpp
  • dbms/src/Debug/MockExecutor/JoinBinder.h
  • dbms/src/Flash/Coprocessor/DAGUtils.cpp
  • dbms/src/Flash/Coprocessor/JoinInterpreterHelper.cpp
  • dbms/src/Flash/Coprocessor/JoinInterpreterHelper.h
  • dbms/src/Flash/Coprocessor/collectOutputFieldTypes.cpp
  • dbms/src/Flash/Coprocessor/tests/gtest_join_get_kind_and_build_index.cpp
  • dbms/src/Flash/Planner/Plans/PhysicalJoin.cpp
  • dbms/src/Flash/tests/gtest_join_executor.cpp
  • dbms/src/Flash/tests/gtest_spill_join.cpp
  • dbms/src/Functions/FunctionsJson.cpp
  • dbms/src/Functions/FunctionsJson.h
  • dbms/src/Functions/tests/gtest_json_object.cpp
  • dbms/src/Interpreters/CrossJoinProbeHelper.cpp
  • dbms/src/Interpreters/Join.cpp
  • dbms/src/Interpreters/Join.h
  • dbms/src/Interpreters/JoinHashMap.cpp
  • dbms/src/Interpreters/JoinHashMap.h
  • dbms/src/Interpreters/JoinPartition.cpp
  • dbms/src/Interpreters/JoinPartition.h
  • dbms/src/Interpreters/JoinUtils.cpp
  • dbms/src/Interpreters/JoinUtils.h
  • dbms/src/Interpreters/NullAwareSemiJoinHelper.h
  • dbms/src/Interpreters/ProbeProcessInfo.cpp
  • dbms/src/Interpreters/ProbeProcessInfo.h
  • dbms/src/Interpreters/tests/gtest_join_null_eq.cpp
  • dbms/src/TestUtils/ColumnsToTiPBExpr.h
  • dbms/src/TestUtils/mockExecutor.cpp
  • dbms/src/TestUtils/mockExecutor.h
  • dbms/src/TestUtils/tests/gtest_mock_executors.cpp
  • dbms/src/TiDB/Decode/JsonBinary.cpp
  • dbms/src/TiDB/Decode/JsonBinary.h
  • docs/note/fullouter_join.md
  • docs/note/nulleq_join.md
  • tests/fullstack-test/expr/json_object.test

Comment on lines +122 to +130
constexpr bool makeLeftJoinSideNullable(tipb::JoinType join_type)
{
return join_type == tipb::JoinType::TypeRightOuterJoin || join_type == tipb::JoinType::TypeFullOuterJoin;
}

constexpr bool makeRightJoinSideNullable(tipb::JoinType join_type)
{
return join_type == tipb::JoinType::TypeLeftOuterJoin || join_type == tipb::JoinType::TypeFullOuterJoin;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

These nullability helpers need the resolved join kind, not just tipb::JoinType.

getJoinKindAndBuildSideIndex() already flips TypeLeftOuterJoin / TypeRightOuterJoin when inner_idx swaps the preserved side. makeLeftJoinSideNullable() / makeRightJoinSideNullable() ignore that, so callers will mark the wrong side nullable for cases like TypeLeftOuterJoin + inner_idx == 0 and TypeRightOuterJoin + inner_idx == 1. Please derive side nullability from the normalized ASTTableJoin::Kind (or from the full tipb::Join including inner_idx) instead of the raw protobuf join type.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbms/src/Flash/Coprocessor/JoinInterpreterHelper.h` around lines 122 - 130,
The current makeLeftJoinSideNullable and makeRightJoinSideNullable compute
nullability from tipb::JoinType which ignores the normalized preserved/nullable
sides returned by getJoinKindAndBuildSideIndex; update these helpers to accept
and use the resolved join kind (e.g., ASTTableJoin::Kind) or the full tipb::Join
(including inner_idx) and determine nullability from that normalized result so
the preserved side logic is correct when inner_idx swaps sides; locate and
change the signatures of makeLeftJoinSideNullable/makeRightJoinSideNullable and
any callers to pass the normalized kind (or tipb::Join) and implement the
boolean checks against ASTTableJoin::Kind (or use inner_idx-aware logic) instead
of raw tipb::JoinType.

Comment on lines +1108 to +1116
const auto * key_nullmap = nullmaps[col];
if (!sources[col] || (key_nullmap && (*key_nullmap)[i]))
throw Exception("JSON documents may not contain NULL member names.");
}

assert(sources[col]);
const auto & key_from = sources[col]->getWhole();
if (unlikely(key_from.size > std::numeric_limits<UInt16>::max()))
throw Exception("TiDB/TiFlash does not yet support JSON objects with the key length >= 65536");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Attach explicit error codes to the new json_object failure paths.

These branches currently throw bare Exception instances, so NULL-member-name and oversized-key failures lose their error classification at the SQL layer. Please keep them aligned with the rest of this file and throw typed DB::Exceptions instead.

Suggested fix
-                        throw Exception("JSON documents may not contain NULL member names.");
+                        throw Exception(
+                            ErrorCodes::ILLEGAL_COLUMN,
+                            "JSON documents may not contain NULL member names.");
...
-                    throw Exception("TiDB/TiFlash does not yet support JSON objects with the key length >= 65536");
+                    throw Exception(
+                        ErrorCodes::ILLEGAL_COLUMN,
+                        "TiDB/TiFlash does not yet support JSON objects with the key length >= 65536");
As per coding guidelines, "Use `DB::Exception` for error handling in C++" and "Prefer the fmt-style constructor for `DB::Exception` with error code first: `throw Exception(ErrorCodes::SOME_CODE, "Message with {}", arg);`"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbms/src/Functions/FunctionsJson.h` around lines 1108 - 1116, Replace the
bare throws in the json_object path with DB::Exception using the file's existing
ErrorCodes and fmt-style constructor: where you currently throw Exception("JSON
documents may not contain NULL member names.") (around key_nullmap /
sources[col]) throw DB::Exception(ErrorCodes::BAD_ARGUMENTS, "JSON documents may
not contain NULL member names."); and where you throw Exception("TiDB/TiFlash
does not yet support JSON objects with the key length >= 65536") (around
key_from.size check) throw DB::Exception(ErrorCodes::TOO_LARGE_ARRAY_OR_STRING,
"TiDB/TiFlash does not yet support JSON objects with the key length >= 65536");
ensure you include DB::Exception and ErrorCodes in the same style used elsewhere
in FunctionsJson.h and keep the message text intact while passing the error code
first.

Comment on lines +58 to +70
void ensureFunctionsRegistered()
{
static std::once_flag once;
std::call_once(once, [] {
try
{
registerFunctions();
}
catch (DB::Exception &)
{
// Another test suite may have already registered the functions.
}
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don’t swallow every registerFunctions() failure inside call_once.

Because the catch lives inside std::call_once, any real registerFunctions() failure also marks the once-flag as satisfied. That hides the setup error for the rest of the process and later tests fail in much less obvious places. Please only ignore the expected duplicate-registration case and rethrow everything else.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbms/src/Interpreters/tests/gtest_join_null_eq.cpp` around lines 58 - 70,
ensureFunctionsRegistered currently swallows all DB::Exception inside the lambda
passed to std::call_once which incorrectly marks the once flag even for real
failures; change the code so the lambda only calls registerFunctions() (no
catch) and move the try/catch to wrap the std::call_once call itself, catching
DB::Exception and only suppressing the specific duplicate-registration error
(identify it via its error code or message) while rethrowing any other
DB::Exception; reference ensureFunctionsRegistered, std::call_once,
std::once_flag, and registerFunctions when making this change.

@windtalker windtalker force-pushed the support_fullouter_join branch from a0f963c to 8e48e45 Compare April 1, 2026 02:40
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
dbms/src/Interpreters/tests/gtest_join_null_eq.cpp (1)

58-71: ⚠️ Potential issue | 🟡 Minor

Exception handling inside std::call_once can hide real failures.

The try/catch inside the std::call_once lambda marks the once-flag as satisfied even for unexpected failures. Only expected duplicate-registration exceptions should be suppressed. Consider restructuring to rethrow non-duplicate-registration exceptions.

🛠️ Proposed fix
 void ensureFunctionsRegistered()
 {
     static std::once_flag once;
-    std::call_once(once, [] {
-        try
-        {
-            registerFunctions();
-        }
-        catch (DB::Exception &)
-        {
-            // Another test suite may have already registered the functions.
-        }
-    });
+    try
+    {
+        std::call_once(once, [] { registerFunctions(); });
+    }
+    catch (DB::Exception & e)
+    {
+        // Suppress expected duplicate registration; rethrow anything else
+        if (e.message().find("already registered") == String::npos)
+            throw;
+    }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbms/src/Interpreters/tests/gtest_join_null_eq.cpp` around lines 58 - 71, The
current ensureFunctionsRegistered uses std::call_once with a lambda that
swallows all DB::Exception, which marks the once flag as satisfied even for
unexpected failures; modify the lambda (or move the try/catch outside the
call_once) so you only suppress the known duplicate-registration error: catch
DB::Exception &e and check its identity (e.g. compare e.code() to the
duplicate-registration error code from ErrorCodes or check the exception message
text that indicates "already registered"), ignore only in that case and rethrow
for any other exception; reference ensureFunctionsRegistered, registerFunctions,
std::call_once, the once flag, and DB::Exception when making this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dbms/src/Flash/tests/gtest_join_executor.cpp`:
- Around line 4264-4354: The new FullOuterJoinWithOtherCondition test cases
(TEST_F JoinExecutorTestRunner::FullOuterJoinWithOtherCondition and the adjacent
cases at lines ~4356-4410 and ~4412-4466) were added without the file's
WRAP_FOR_TEST_BEGIN/WRAP_FOR_TEST_END test harness, so they only run one
execution engine; wrap each test block (the TEST_F body that builds requests,
calls executeAndAssertColumnsEqual and ASSERT_COLUMNS_EQ_UR) with the
WRAP_FOR_TEST_BEGIN and WRAP_FOR_TEST_END macros (the same way existing join
tests do) so the cases run under both DataStream and Pipeline engines,
preserving the existing test structure around
context.scan(...).join(...).build(context), executeAndAssertColumnsEqual, and
the aggregation/ASSERT_COLUMNS_EQ_UR assertions.

In `@dbms/src/Functions/tests/gtest_json_object.cpp`:
- Line 15: The include <Columns/ColumnNullable.h> in
dbms/src/Functions/tests/gtest_json_object.cpp is unresolved and not used;
remove that include line (or if a symbol from ColumnNullable is actually needed,
replace it with the correct header path that defines that symbol). Update
gtest_json_object.cpp by deleting the `#include` <Columns/ColumnNullable.h> (or
swap to the correct header) so the compilation error from the missing header is
resolved.

In `@dbms/src/TestUtils/tests/gtest_mock_executors.cpp`:
- Around line 344-372: The test currently calls
collectOutputFieldTypes(*request) on the raw join built by
context.scan(...).join(...).build(context), so it never exercises column
pruning; modify the test to wrap the join request in a parent projection (or
equivalent pruning operator) that selects a subset of output columns (e.g., only
the join outputs you want to keep) before calling build(context), then call
collectOutputFieldTypes on that pruned request to assert the join executor’s
field types post-prune (keep using MockDAGRequestTest, collectOutputFieldTypes,
context.scan, join, and build to locate the code and reuse expected_size
assertions).
- Line 15: The test includes a missing header for collectOutputFieldTypes
causing compilation failure; update gtest_mock_executors.cpp to include the
correct public header that declares collectOutputFieldTypes (or adjust the
include path) so the symbol is resolved — locate usages of
collectOutputFieldTypes in this test and replace or add the proper include (the
header that actually exposes the collectOutputFieldTypes declaration) rather
than the non-existent Flash/Coprocessor/collectOutputFieldTypes.h.

---

Duplicate comments:
In `@dbms/src/Interpreters/tests/gtest_join_null_eq.cpp`:
- Around line 58-71: The current ensureFunctionsRegistered uses std::call_once
with a lambda that swallows all DB::Exception, which marks the once flag as
satisfied even for unexpected failures; modify the lambda (or move the try/catch
outside the call_once) so you only suppress the known duplicate-registration
error: catch DB::Exception &e and check its identity (e.g. compare e.code() to
the duplicate-registration error code from ErrorCodes or check the exception
message text that indicates "already registered"), ignore only in that case and
rethrow for any other exception; reference ensureFunctionsRegistered,
registerFunctions, std::call_once, the once flag, and DB::Exception when making
this change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a757c79f-4497-47c9-8eaa-8ae661954a82

📥 Commits

Reviewing files that changed from the base of the PR and between a0f963c and 8e48e45.

📒 Files selected for processing (33)
  • dbms/src/Debug/MockExecutor/JoinBinder.cpp
  • dbms/src/Debug/MockExecutor/JoinBinder.h
  • dbms/src/Flash/Coprocessor/DAGUtils.cpp
  • dbms/src/Flash/Coprocessor/JoinInterpreterHelper.cpp
  • dbms/src/Flash/Coprocessor/JoinInterpreterHelper.h
  • dbms/src/Flash/Coprocessor/tests/gtest_join_get_kind_and_build_index.cpp
  • dbms/src/Flash/Planner/Plans/PhysicalJoin.cpp
  • dbms/src/Flash/tests/gtest_join_executor.cpp
  • dbms/src/Flash/tests/gtest_spill_join.cpp
  • dbms/src/Functions/FunctionsJson.cpp
  • dbms/src/Functions/FunctionsJson.h
  • dbms/src/Functions/tests/gtest_json_object.cpp
  • dbms/src/Interpreters/CrossJoinProbeHelper.cpp
  • dbms/src/Interpreters/Join.cpp
  • dbms/src/Interpreters/Join.h
  • dbms/src/Interpreters/JoinHashMap.cpp
  • dbms/src/Interpreters/JoinHashMap.h
  • dbms/src/Interpreters/JoinPartition.cpp
  • dbms/src/Interpreters/JoinPartition.h
  • dbms/src/Interpreters/JoinUtils.cpp
  • dbms/src/Interpreters/JoinUtils.h
  • dbms/src/Interpreters/NullAwareSemiJoinHelper.h
  • dbms/src/Interpreters/ProbeProcessInfo.cpp
  • dbms/src/Interpreters/ProbeProcessInfo.h
  • dbms/src/Interpreters/tests/gtest_join_null_eq.cpp
  • dbms/src/TestUtils/ColumnsToTiPBExpr.h
  • dbms/src/TestUtils/mockExecutor.cpp
  • dbms/src/TestUtils/mockExecutor.h
  • dbms/src/TestUtils/tests/gtest_mock_executors.cpp
  • dbms/src/TiDB/Decode/JsonBinary.cpp
  • dbms/src/TiDB/Decode/JsonBinary.h
  • docs/note/nulleq_join.md
  • tests/fullstack-test/expr/json_object.test
✅ Files skipped from review due to trivial changes (5)
  • dbms/src/Functions/FunctionsJson.cpp
  • dbms/src/TestUtils/ColumnsToTiPBExpr.h
  • tests/fullstack-test/expr/json_object.test
  • dbms/src/Interpreters/ProbeProcessInfo.h
  • dbms/src/Functions/FunctionsJson.h
🚧 Files skipped from review as they are similar to previous changes (15)
  • dbms/src/Flash/Coprocessor/DAGUtils.cpp
  • dbms/src/TestUtils/mockExecutor.h
  • dbms/src/TiDB/Decode/JsonBinary.h
  • dbms/src/TestUtils/mockExecutor.cpp
  • dbms/src/Interpreters/JoinPartition.h
  • dbms/src/Debug/MockExecutor/JoinBinder.h
  • dbms/src/Flash/tests/gtest_spill_join.cpp
  • dbms/src/TiDB/Decode/JsonBinary.cpp
  • dbms/src/Interpreters/NullAwareSemiJoinHelper.h
  • dbms/src/Interpreters/JoinUtils.cpp
  • dbms/src/Interpreters/JoinUtils.h
  • dbms/src/Flash/Coprocessor/JoinInterpreterHelper.h
  • dbms/src/Flash/Coprocessor/JoinInterpreterHelper.cpp
  • dbms/src/Interpreters/Join.cpp
  • dbms/src/Interpreters/JoinPartition.cpp

Comment on lines +4264 to +4354
TEST_F(JoinExecutorTestRunner, FullOuterJoinWithOtherCondition)
try
{
using tipb::JoinType;

const std::vector<std::tuple<ColumnsWithTypeAndName, ColumnsWithTypeAndName, ColumnsWithTypeAndName, String>>
test_cases
= {{
{toNullableVec<Int32>("a", {1}), toNullableVec<Int32>("c", {10})},
{toNullableVec<Int32>("a", {1}), toNullableVec<Int32>("c", {5})},
{toNullableVec<Int32>("a", {1, {}}),
toNullableVec<Int32>("c", {10, {}}),
toNullableVec<Int32>("a", {{}, 1}),
toNullableVec<Int32>("c", {{}, 5})},
"key matched but other condition failed for all rows",
},
{
{toNullableVec<Int32>("a", {2}), toNullableVec<Int32>("c", {20})},
{toNullableVec<Int32>("a", {1}), toNullableVec<Int32>("c", {5})},
{toNullableVec<Int32>("a", {2, {}}),
toNullableVec<Int32>("c", {20, {}}),
toNullableVec<Int32>("a", {{}, 1}),
toNullableVec<Int32>("c", {{}, 5})},
"key not matched",
},
{
{toNullableVec<Int32>("a", {{}}), toNullableVec<Int32>("c", {10})},
{toNullableVec<Int32>("a", {1}), toNullableVec<Int32>("c", {5})},
{toNullableVec<Int32>("a", {{}, {}}),
toNullableVec<Int32>("c", {10, {}}),
toNullableVec<Int32>("a", {{}, 1}),
toNullableVec<Int32>("c", {{}, 5})},
"probe-side null key should still be emitted as unmatched row",
},
{
{toNullableVec<Int32>("a", {1}), toNullableVec<Int32>("c", {3})},
{toNullableVec<Int32>("a", {1, 1}), toNullableVec<Int32>("c", {2, 4})},
{toNullableVec<Int32>("a", {1, {}}),
toNullableVec<Int32>("c", {3, {}}),
toNullableVec<Int32>("a", {1, 1}),
toNullableVec<Int32>("c", {4, 2})},
"only rows that pass other condition should be marked used",
}};

for (const auto & [left, right, res, test_case_name] : test_cases)
{
SCOPED_TRACE(test_case_name);
context.addMockTable(
"full_outer_other_condition",
"t",
{{"a", TiDB::TP::TypeLong}, {"c", TiDB::TP::TypeLong}},
left);
context.addMockTable(
"full_outer_other_condition",
"s",
{{"a", TiDB::TP::TypeLong}, {"c", TiDB::TP::TypeLong}},
right);

auto request = context.scan("full_outer_other_condition", "t")
.join(
context.scan("full_outer_other_condition", "s"),
JoinType::TypeFullOuterJoin,
{col("a")},
{},
{},
{lt(col("t.c"), col("s.c"))},
{},
0,
false,
1)
.build(context);
executeAndAssertColumnsEqual(request, res);

auto request_column_prune = context.scan("full_outer_other_condition", "t")
.join(
context.scan("full_outer_other_condition", "s"),
JoinType::TypeFullOuterJoin,
{col("a")},
{},
{},
{lt(col("t.c"), col("s.c"))},
{},
0,
false,
1)
.aggregation({Count(lit(static_cast<UInt64>(1)))}, {})
.build(context);
ASSERT_COLUMNS_EQ_UR(genScalarCountResults(res), executeStreams(request_column_prune, 2));
}
}
CATCH
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Run these full-outer executor cases under both execution modes.

These additions bypass the file’s existing WRAP_FOR_JOIN_TEST_BEGIN/END helper, so they only validate one engine. Since this PR is changing join execution semantics, that leaves the other engine untested for exactly the scenarios this suite is meant to lock down. Based on learnings: Wrap executor tests in WRAP_FOR_TEST_BEGIN and WRAP_FOR_TEST_END macros to ensure test coverage for both DataStream and Pipeline execution engines.

Also applies to: 4356-4410, 4412-4466

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbms/src/Flash/tests/gtest_join_executor.cpp` around lines 4264 - 4354, The
new FullOuterJoinWithOtherCondition test cases (TEST_F
JoinExecutorTestRunner::FullOuterJoinWithOtherCondition and the adjacent cases
at lines ~4356-4410 and ~4412-4466) were added without the file's
WRAP_FOR_TEST_BEGIN/WRAP_FOR_TEST_END test harness, so they only run one
execution engine; wrap each test block (the TEST_F body that builds requests,
calls executeAndAssertColumnsEqual and ASSERT_COLUMNS_EQ_UR) with the
WRAP_FOR_TEST_BEGIN and WRAP_FOR_TEST_END macros (the same way existing join
tests do) so the cases run under both DataStream and Pipeline engines,
preserving the existing test structure around
context.scan(...).join(...).build(context), executeAndAssertColumnsEqual, and
the aggregation/ASSERT_COLUMNS_EQ_UR assertions.

// See the License for the specific language governing permissions and
// limitations under the License.

#include <Columns/ColumnNullable.h>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Remove or fix unresolved header include to avoid build break

<Columns/ColumnNullable.h> is reported as not found by clang, and this file does not directly use symbols from it. This can fail compilation in CI; drop the include (or replace with the correct header path if genuinely required).

Proposed fix
-#include <Columns/ColumnNullable.h>
 `#include` <TestUtils/FunctionTestUtils.h>
 `#include` <TestUtils/TiFlashTestBasic.h>
 `#include` <TiDB/Schema/TiDBTypes.h>
 `#include` <gtest/gtest.h>
🧰 Tools
🪛 Clang (14.0.6)

[error] 15-15: 'Columns/ColumnNullable.h' file not found

(clang-diagnostic-error)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbms/src/Functions/tests/gtest_json_object.cpp` at line 15, The include
<Columns/ColumnNullable.h> in dbms/src/Functions/tests/gtest_json_object.cpp is
unresolved and not used; remove that include line (or if a symbol from
ColumnNullable is actually needed, replace it with the correct header path that
defines that symbol). Update gtest_json_object.cpp by deleting the `#include`
<Columns/ColumnNullable.h> (or swap to the correct header) so the compilation
error from the missing header is resolved.

// See the License for the specific language governing permissions and
// limitations under the License.

#include <Flash/Coprocessor/collectOutputFieldTypes.h>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix the unresolved collectOutputFieldTypes include.

Clang is already reporting Flash/Coprocessor/collectOutputFieldTypes.h as missing, so this test file will not compile until the declaration is exposed through an existing public header or the include path is corrected.

🧰 Tools
🪛 Clang (14.0.6)

[error] 15-15: 'Flash/Coprocessor/collectOutputFieldTypes.h' file not found

(clang-diagnostic-error)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbms/src/TestUtils/tests/gtest_mock_executors.cpp` at line 15, The test
includes a missing header for collectOutputFieldTypes causing compilation
failure; update gtest_mock_executors.cpp to include the correct public header
that declares collectOutputFieldTypes (or adjust the include path) so the symbol
is resolved — locate usages of collectOutputFieldTypes in this test and replace
or add the proper include (the header that actually exposes the
collectOutputFieldTypes declaration) rather than the non-existent
Flash/Coprocessor/collectOutputFieldTypes.h.

Comment on lines +344 to +372
TEST_F(MockDAGRequestTest, SemiJoinColumnPruneKeepsJoinOutputSchema)
try
{
const std::vector<std::pair<tipb::JoinType, size_t>> test_cases = {
{tipb::JoinType::TypeSemiJoin, 3},
{tipb::JoinType::TypeAntiSemiJoin, 3},
{tipb::JoinType::TypeLeftOuterSemiJoin, 4},
{tipb::JoinType::TypeAntiLeftOuterSemiJoin, 4},
};

for (const auto & [join_type, expected_size] : test_cases)
{
auto request = context.scan("test_db", "l_table")
.join(context.scan("test_db", "r_table"), join_type, {col("join_c")})
.build(context);

auto output_field_types = collectOutputFieldTypes(*request);
ASSERT_EQ(output_field_types.size(), expected_size) << fmt::underlying(join_type);
ASSERT_EQ(output_field_types[0].tp(), TiDB::TypeLong);
ASSERT_EQ(output_field_types[1].tp(), TiDB::TypeString);
ASSERT_EQ(output_field_types[2].tp(), TiDB::TypeString);

if (expected_size == 4)
{
ASSERT_EQ(output_field_types[3].tp(), TiDB::TypeTiny);
ASSERT_EQ(output_field_types[3].flag() & TiDB::ColumnFlagNotNull, 0);
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This test never exercises column pruning.

collectOutputFieldTypes(*request) is being called on a bare join, so it only checks the default join output schema. To cover the regression implied by the test name, add a parent that actually prunes join outputs and then assert the join executor’s field types after that prune step.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbms/src/TestUtils/tests/gtest_mock_executors.cpp` around lines 344 - 372,
The test currently calls collectOutputFieldTypes(*request) on the raw join built
by context.scan(...).join(...).build(context), so it never exercises column
pruning; modify the test to wrap the join request in a parent projection (or
equivalent pruning operator) that selects a subset of output columns (e.g., only
the join outputs you want to keep) before calling build(context), then call
collectOutputFieldTypes on that pruned request to assert the join executor’s
field types post-prune (keep using MockDAGRequestTest, collectOutputFieldTypes,
context.scan, join, and build to locate the code and reuse expected_size
assertions).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant