Skip to content

[spark][bugfix] Use pk index for comparator align with SortMergeReader#2987

Merged
luoyuxia merged 1 commit into
apache:mainfrom
Yohahaha:fix-pk-read-project
Apr 7, 2026
Merged

[spark][bugfix] Use pk index for comparator align with SortMergeReader#2987
luoyuxia merged 1 commit into
apache:mainfrom
Yohahaha:fix-pk-read-project

Conversation

@Yohahaha

@Yohahaha Yohahaha commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

Purpose

Linked issue: close #2986

Brief change log

Tests

Spark Read: primary key table with random project

API and Format

Documentation

@Yohahaha Yohahaha force-pushed the fix-pk-read-project branch from db7b8a6 to c3483f5 Compare April 7, 2026 02:29
@Yohahaha

Yohahaha commented Apr 7, 2026

Copy link
Copy Markdown
Contributor Author

@YannByron @wuchong please help review this pr, thank you!

val pkRowType = new RowType(pkFields)
val keyEncoder =
encode.KeyEncoder.ofPrimaryKeyEncoder(
rowType,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what is needed here is the row type of input data, not just the row type of primary key.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually only pk col is needed, ut already covered this case.

private val sortedLogRecords = logRecords.sortWith {
case (record1, record2) =>
val keyComparison = comparator.compare(record1.getRow, record2.getRow)
val keyComparison = comparator.compare(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this change is needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SortMergeReader compare row with pk, but here compare with full row, so ut failed when pk col was not at begining.

@YannByron

Copy link
Copy Markdown
Contributor

+1.

@luoyuxia luoyuxia left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@luoyuxia luoyuxia merged commit dbd7f5c into apache:main Apr 7, 2026
6 checks passed
@Yohahaha Yohahaha deleted the fix-pk-read-project branch April 7, 2026 10:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[spark][bug] Batch read PK table failed when use random column as primary key

3 participants