support nulleq in tiflash by windtalker · Pull Request #10726 · pingcap/tiflash

windtalker · 2026-02-14T05:22:53Z

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

What is changed and how it works?

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

None

Summary by CodeRabbit

New Features
- Added a null-aware equality operator covering more types (including vector float32) with proper string collation forwarding.
Behavior
- Query/filter parsing now recognizes null-safe equality and emits is-null predicates when comparing to NULL.
Tests
- Added unit tests validating mapping, runtime null propagation, collation behavior, and filter-parser null-safe cases.

coderabbitai · 2026-02-14T05:23:12Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a TiDB null-safe equality function ("tidbNullEQ"), maps multiple NullEQ scalar signatures (including VectorFloat32) to that function, updates DeltaMerge filter-parser to treat NullEQ with NULL literal as is-null, and adds unit tests in Coprocessor, Functions, and DeltaMerge.

Cohort / File(s)	Summary
Functions / Implementation & Tests `dbms/src/Functions/FunctionsComparison.cpp`, `dbms/src/Functions/tests/gtest_tidb_null_eq.cpp`	Adds `FunctionTiDBNullEQ` (`tidbNullEQ`) with custom NULL semantics and collator forwarding; registers it and adds unit tests for non-null, nullable, all-null, const-null, and collator cases.
Filter Parser / DeltaMerge tests `dbms/src/Storages/DeltaMerge/FilterParser/FilterParser.cpp`, `dbms/src/Storages/DeltaMerge/tests/gtest_dm_filter_parser_nulleq.cpp`	Maps NullEQ* signatures (including `NullEQVectorFloat32`) to `RSFilterType::Equal`; `parseTiCompareExpr` now emits `isnull` when the compared literal is NULL; adds parser tests for null-safe equality scenarios.
Coprocessor DAG utils & tests `dbms/src/Flash/Coprocessor/DAGUtils.cpp`, `dbms/src/Flash/Coprocessor/tests/gtest_tidb_null_eq_func.cpp`	Updates `scalar_func_map` to map NullEQ variants to `"tidbNullEQ"` (adds `NullEQVectorFloat32`); adds a unit test asserting `getFunctionName(expr)` returns `"tidbNullEQ"` for those signatures.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Client as Client/Planner
participant DAG as Coprocessor / DAGUtils
participant Parser as FilterParser
participant Func as tidbNullEQ
participant Storage as DeltaMerge / Storage
Client->>DAG: construct tipb::Expr (NullEQ*)
DAG->>Parser: forward mapped function name ("tidbNullEQ")
Parser->>Storage: produce predicate (equal() or isnull() if literal NULL)
Client->>Func: execute tidbNullEQ on Block columns
Func->>Func: unwrap nullable/const, delegate to equals, post-process NULL semantics
Func-->>Client: return UInt8 result per row

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

"I nibbled on bytes beneath the moon's soft glow,
tidbNullEQ hops where nulls and values go.
A test, a map, a parser's tiny cheer,
equal when matched — or isnull when clear.
— Rabbit 🐇"

🚥 Pre-merge checks | ✅ 1 | ❌ 3

❌ Failed checks (3 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection	⚠️ Warning	❌ Merge conflicts detected (11 files): ⚔️ `dbms/src/Flash/Coprocessor/DAGExpressionAnalyzer.cpp` (content) ⚔️ `dbms/src/Flash/Coprocessor/DAGStorageInterpreter.cpp` (content) ⚔️ `dbms/src/Flash/Coprocessor/DAGUtils.cpp` (content) ⚔️ `dbms/src/Flash/Coprocessor/GenSchemaAndColumn.cpp` (content) ⚔️ `dbms/src/Flash/Coprocessor/RemoteRequest.cpp` (content) ⚔️ `dbms/src/Flash/Planner/Plans/PhysicalTableScan.cpp` (content) ⚔️ `dbms/src/Functions/FunctionsComparison.cpp` (content) ⚔️ `dbms/src/Storages/DeltaMerge/Filter/PushDownFilter.cpp` (content) ⚔️ `dbms/src/Storages/DeltaMerge/Filter/RSOperator.cpp` (content) ⚔️ `dbms/src/Storages/DeltaMerge/FilterParser/FilterParser.cpp` (content) ⚔️ `dbms/src/Storages/KVStore/Types.h` (content) These conflicts must be resolved before merging into `feature/release-8.5-materialized-view`.	Resolve conflicts locally and push changes to this branch.
Description check	⚠️ Warning	The pull request description is incomplete. Critical sections lack substantive content: 'Problem Summary' is empty, the 'What is changed and how it works?' section contains only an empty commit-message block, and no implementation details are provided.	Provide a detailed problem summary explaining the nulleq feature being added, include a non-empty commit message or description of changes, and update the checklist with actually selected items (currently only 'Unit test' is checked but appears to be a template artifact).

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'support nulleq in tiflash' directly and clearly summarizes the main objective of the PR, which is to add support for null-safe equality operations across multiple components in TiFlash.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR adds support for TiDB’s null-safe equality (<=>) in TiFlash by introducing a dedicated tidbNullEQ function, wiring TiDB DAG signatures to it, and extending DM rough-set filter parsing to treat <=> NULL as isnull(col).

Changes:

Add tidbNullEQ comparison function with correct NULL semantics and collation forwarding.
Map tipb::ScalarFuncSig::NullEQ* to tidbNullEQ in DAG function name resolution and to Equal in DM rough-set filter parsing (with <=> NULL lowered to isnull).
Add unit tests covering function behavior, DAG mapping, and DM filter parsing cases.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
dbms/src/Storages/DeltaMerge/tests/gtest_dm_filter_parser_nulleq.cpp	Adds DM filter parser unit tests for `<=>` lowering to `equal` / `isnull`.
dbms/src/Storages/DeltaMerge/FilterParser/FilterParser.cpp	Adds `<=> NULL` handling in rough-set compare parsing; maps `NullEQ*` sigs to `Equal`.
dbms/src/Functions/tests/gtest_tidb_null_eq.cpp	Adds unit tests validating `tidbNullEQ` semantics (nullable/const/null-only/collation).
dbms/src/Functions/FunctionsComparison.cpp	Implements and registers new `tidbNullEQ` function.
dbms/src/Flash/Coprocessor/tests/gtest_tidb_null_eq_func.cpp	Adds unit tests ensuring DAGUtils maps `NullEQ*` sigs to `tidbNullEQ`.
dbms/src/Flash/Coprocessor/DAGUtils.cpp	Wires `NullEQ*` signatures to function name `tidbNullEQ`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-14T13:38:32Z

dbms/src/Storages/DeltaMerge/FilterParser/FilterParser.cpp

+        if ((expr.sig() == tipb::ScalarFuncSig::NullEQInt || expr.sig() == tipb::ScalarFuncSig::NullEQReal
+             || expr.sig() == tipb::ScalarFuncSig::NullEQString || expr.sig() == tipb::ScalarFuncSig::NullEQDecimal
+             || expr.sig() == tipb::ScalarFuncSig::NullEQTime || expr.sig() == tipb::ScalarFuncSig::NullEQDuration
+             || expr.sig() == tipb::ScalarFuncSig::NullEQVectorFloat32)


parseTiCompareExpr special-cases <=> NULL to isnull(col) for several NullEQ* signatures, but NullEQJson is mapped to RSFilterType::Equal below and is not included in this signature list. If a NullEQJson comparison against NULL ever reaches this code path, it would incorrectly produce equal(col, NULL) instead of isnull(col). Consider including tipb::ScalarFuncSig::NullEQJson in this check (or refactor to a shared helper for all NullEQ* sigs).

Suggested change

|| expr.sig() == tipb::ScalarFuncSig::NullEQVectorFloat32)

|| expr.sig() == tipb::ScalarFuncSig::NullEQVectorFloat32

|| expr.sig() == tipb::ScalarFuncSig::NullEQJson)

Copilot · 2026-02-14T13:38:32Z

dbms/src/Functions/FunctionsComparison.cpp

+        /// - both NULL => 1
+        /// - one NULL  => 0
+        /// - no NULL   => equals result
+        auto eq_mutable = (*std::move(eq_col)).mutate();


auto eq_mutable = (*std::move(eq_col)).mutate(); uses std::move on eq_col even though it isn't being moved-from (no move-construction happens). This is a bit misleading and can be simplified to avoid confusion about eq_col's lifetime/ownership.

Suggested change

auto eq_mutable = (*std::move(eq_col)).mutate();

auto eq_mutable = eq_col->mutate();

Actually, the mutate() can only be called on rvalue

Signed-off-by: xufei <xufeixw@mail.ustc.edu.cn>

windtalker · 2026-03-23T11:50:46Z

@pantheon-bot review

pantheon-ai · 2026-03-23T11:50:54Z

Review failed due to infrastructure/execution failure after retries. Please re-trigger review.

_{ℹ️ Learn more details on Pantheon AI.}

windtalker · 2026-03-23T13:41:05Z

@pantheon-bot review

pantheon-ai · 2026-03-23T13:41:12Z

Review failed due to infrastructure/execution failure after retries. Please re-trigger review.

_{ℹ️ Learn more details on Pantheon AI.}

xzhangxian1008

Other LGTM

xzhangxian1008 · 2026-03-24T00:35:26Z

dbms/src/Functions/FunctionsComparison.cpp

+
+        /// Use equals to validate that the input types are comparable.
+        /// Always return non-nullable UInt8 because `NULL <=> x` is always true/false (not NULL).
+        FunctionEquals().getReturnTypeImpl({removeNullable(arguments[0]), removeNullable(arguments[1])});


Maybe we can directly return at L65 as FunctionEquals().getReturnTypeImpl always returns DataTypeUInt8.

It also does some input types validation in FunctionEquals().getReturnTypeImpl(), so I think it's better to keep it.

Mini256 · 2026-03-24T03:14:36Z

@pantheon-bot review this pull request again

pantheon-ai · 2026-03-24T03:14:40Z

Hi @Mini256,

You're not in the allow list for this repository. Only users in the allow list can trigger Pantheon here. Please contact the repo maintainer if you need access.

Mini256 · 2026-03-24T03:15:23Z

@pantheon-bot review this pull request again

pantheon-ai · 2026-03-24T03:15:31Z

Review Complete

Findings: 1 issues
Posted: 1
Duplicates/Skipped: 0

_{ℹ️ Learn more details on Pantheon AI.}

gengliqi

LGTM

dbms/src/Functions/tests/gtest_tidb_null_eq.cpp

dbms/src/Flash/Coprocessor/tests/gtest_tidb_null_eq_func.cpp

ti-chi-bot · 2026-03-24T03:31:06Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gengliqi, xzhangxian1008

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [gengliqi]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2026-03-24T03:31:08Z

[LGTM Timeline notifier]

Timeline:

2026-03-24 01:18:49.791546334 +0000 UTC m=+231125.827616594: ☑️ agreed by xzhangxian1008.
2026-03-24 03:31:07.513691165 +0000 UTC m=+239063.549761425: ☑️ agreed by gengliqi.

pantheon-ai · 2026-03-24T05:08:03Z

dbms/src/Storages/DeltaMerge/FilterParser/FilterParser.cpp

-    //{tipb::ScalarFuncSig::NullEQTime, "cast"},
-    //{tipb::ScalarFuncSig::NullEQDuration, "cast"},
-    //{tipb::ScalarFuncSig::NullEQJson, "cast"},
+    {tipb::ScalarFuncSig::NullEQInt, FilterParser::RSFilterType::Equal},


[P1] DeltaMerge rough-set filter incorrectly prunes packs for NOT (col <=> non-null_literal)

Why: <=> is NULL-safe (never returns NULL), so for nullable col and non-NULL literal k, NOT (col <=> k) must match col IS NULL rows. However, the rough-set path lowers col <=> k to the same RS operator as col = k, which can produce AllNull for packs containing both {k} and NULL. Then NOT flips that to NoneNull, causing the pack to be skipped entirely and silently dropping matching NULL rows under default rough-set filtering.

Evidence:

Null-safe equality mapped to Equal at FilterParser.cpp:592, lowered to createEqual(...) for non-NULL literals (line 214)

Equal can return All when min==max==k (RoughCheck.h:42), and null-mark forces AllNull via addNullIfHasNull (MinMaxIndex.cpp:402, 603)

NOT applied as !RSResult (Not.h:31), and ! flips All -> None while preserving has_null (RSResult.cpp:26, RSResult.h:69), so AllNull -> NoneNull

Packs with None/NoneNull are skipped (RSResult.h:63), DMFileReader advances past them (DMFileReader.cpp:115)

Reachable in default config: dt_enable_rough_set_filter defaults to true (Settings.h:176)

@pantheon-ai Can you construct a unit test to reproduce this "incorrectly prunes packs" problem?

@JaySon-Huang I've created a unit test that reproduces this issue. The test is located at:

dbms/src/Storages/DeltaMerge/tests/gtest_dm_filter_parser_nulleq_not.cpp

Test Name: DMFilterParserTest.RoughSet_NotNullEQ_PrunesPacksWithLiteralAndNull_P1BugRepro

What it tests:

Creates a nullable column pack with values {1, NULL}

Applies filter NOT (a <=> 1)

Expects rough-set result Some (read the pack, apply precise filter)

Current buggy behavior: returns NoneNull (skips pack, dropping matching NULL rows)

Key assertion (intentionally fails on current code):

EXPECT_EQ(results[0], RSResult::Some);

To run:

cd tiflash git submodule update --init --recursive CC=clang CXX=clang++ cmake --preset dev cmake --build cmake-build-debug --target gtests_dm ./cmake-build-debug/dbms/src/Storages/DeltaMerge/tests/gtests_dm \ --gtest_filter=DMFilterParserTest.RoughSet_NotNullEQ_PrunesPacksWithLiteralAndNull_P1BugRepro

The test demonstrates that NOT (col <=> non-null_literal) incorrectly prunes packs containing both the literal value AND NULL values, causing wrong query results.

_{ℹ️ Learn more details on Pantheon AI.}

submit a new pr to fix this #10769

Thanks @windtalker! I see the fix PR #10769 has been submitted.

The unit test I created (gtest_dm_filter_parser_nulleq_not.cpp) can be used to verify the fix works correctly. Once the fix is merged, the test assertion EXPECT_EQ(results[0], RSResult::Some) should pass instead of failing.

Let me know if you'd like me to review the fix PR to ensure it addresses all the edge cases identified in the original issue.

_{ℹ️ Learn more details on Pantheon AI.}

DeltaMerge: fix rough-set semantics for NullEQ - add a dedicated `NullEqual` rough-set operator instead of lowering to `Equal` - teach minmax index to evaluate nullable `NullEQ` with null-safe semantics - keep the old minmax-index compatibility path conservative when it cannot distinguish pure-null from mixed packs - add regression coverage for nullable packs and old minmax-index compatibility - update the new NullEQ test files introduced in #10726 to use copyright year 2026

ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 14, 2026

ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 14, 2026

windtalker changed the title ~~[DNM] support nulleq in tiflash~~ support nulleq in tiflash Feb 14, 2026

windtalker requested a review from Copilot February 14, 2026 13:34

Copilot started reviewing on behalf of windtalker February 14, 2026 13:34 View session

Copilot AI reviewed Feb 14, 2026

View reviewed changes

windtalker force-pushed the add_nulleq_support branch from a393697 to 80b0894 Compare March 15, 2026 02:35

windtalker added 9 commits March 23, 2026 19:18

support nulleq in tiflash

6a3e95b

Signed-off-by: xufei <xufeixw@mail.ustc.edu.cn>

refine

89ddff5

Signed-off-by: xufei <xufeixw@mail.ustc.edu.cn>

save work

219e0c6

Signed-off-by: xufei <xufeixw@mail.ustc.edu.cn>

save work

288d67e

Signed-off-by: xufei <xufeixw@mail.ustc.edu.cn>

save work

00b60b5

Signed-off-by: xufei <xufeixw@mail.ustc.edu.cn>

Functions: normalize tidbNullEQ output

285492d

save work

3be6797

Signed-off-by: xufei <xufeixw@mail.ustc.edu.cn>

[Cross:tests] Cover more tidbNullEQ sigs

e325c0e

fix format

af5acc1

Signed-off-by: xufei <xufeixw@mail.ustc.edu.cn>

windtalker force-pushed the add_nulleq_support branch from 80b0894 to af5acc1 Compare March 23, 2026 11:19

fix: handle nulleq json null filter

0276b7e

xzhangxian1008 approved these changes Mar 24, 2026

View reviewed changes

ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Mar 24, 2026

gengliqi approved these changes Mar 24, 2026

View reviewed changes

dbms/src/Functions/tests/gtest_tidb_null_eq.cpp Show resolved Hide resolved

dbms/src/Flash/Coprocessor/tests/gtest_tidb_null_eq_func.cpp Show resolved Hide resolved

ti-chi-bot bot added lgtm approved and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 24, 2026

ti-chi-bot bot merged commit 7d306f9 into pingcap:feature/release-8.5-materialized-view Mar 24, 2026
5 checks passed

pantheon-ai bot reviewed Mar 24, 2026

View reviewed changes

windtalker mentioned this pull request Mar 24, 2026

DeltaMerge: fix rough-set semantics for NullEQ #10769

Merged

12 tasks

This was referenced Apr 1, 2026

[DNM] support fullouter join and nulleq join #10783

Open

Support null-eq join #10788

Merged

	\|\| expr.sig() == tipb::ScalarFuncSig::NullEQVectorFloat32)
	\|\| expr.sig() == tipb::ScalarFuncSig::NullEQVectorFloat32
	\|\| expr.sig() == tipb::ScalarFuncSig::NullEQJson)

	auto eq_mutable = (*std::move(eq_col)).mutate();
	auto eq_mutable = eq_col->mutate();

Conversation

windtalker commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What is changed and how it works?

Check List

Release note

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

windtalker Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

windtalker commented Mar 23, 2026

Uh oh!

pantheon-ai bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

windtalker commented Mar 23, 2026

Uh oh!

pantheon-ai bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xzhangxian1008 left a comment

Choose a reason for hiding this comment

Uh oh!

xzhangxian1008 Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

windtalker Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Mini256 commented Mar 24, 2026

Uh oh!

pantheon-ai bot commented Mar 24, 2026

Uh oh!

Mini256 commented Mar 24, 2026

Uh oh!

pantheon-ai bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gengliqi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ti-chi-bot bot commented Mar 24, 2026

Uh oh!

ti-chi-bot bot commented Mar 24, 2026

[LGTM Timeline notifier]

Uh oh!

Uh oh!

pantheon-ai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

JaySon-Huang Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

pantheon-ai bot Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

windtalker commented Feb 14, 2026 •

edited

Loading

coderabbitai bot commented Feb 14, 2026 •

edited

Loading

pantheon-ai bot commented Mar 23, 2026 •

edited

Loading

pantheon-ai bot commented Mar 23, 2026 •

edited

Loading

pantheon-ai bot commented Mar 24, 2026 •

edited

Loading

pantheon-ai bot Mar 24, 2026 •

edited

Loading

pantheon-ai bot Mar 24, 2026 •

edited

Loading