Skip to content

[CELEBORN-1671] CelebornShuffleReader will try replica if create client failed#2854

Closed
FMX wants to merge 7 commits intoapache:mainfrom
FMX:b1671
Closed

[CELEBORN-1671] CelebornShuffleReader will try replica if create client failed#2854
FMX wants to merge 7 commits intoapache:mainfrom
FMX:b1671

Conversation

@FMX
Copy link
Contributor

@FMX FMX commented Oct 28, 2024

What changes were proposed in this pull request?

  1. To bypass exceptions when creating clients failed in CelebornShuffleReader in spark 3.
  2. Client will try the location's replicas in reading locations.

Why are the changes needed?

Allow clients to retry locations when creating clients failed.

Does this PR introduce any user-facing change?

NO.

How was this patch tested?

Pass GA.

@FMX FMX changed the title [CELEBORN-1671] CelebornShuffleReader will retry replica if create client failed [CELEBORN-1671] CelebornShuffleReader will try replica if create client failed Oct 28, 2024
s"Failed to create client for $shuffleKey-$partitionId from host: ${location.hostAndFetchPort}")
}
}
val (_, locArr, pbOpenStreamListBuilder) = workerRequestMap.get(hostPort)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move these code to try block?

Copy link
Contributor

@RexXiong RexXiong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! Merge to main(v0.6.0) and branch-0.5(v0.5.3)

@RexXiong RexXiong closed this in 7dcd259 Nov 6, 2024
RexXiong pushed a commit that referenced this pull request Nov 6, 2024
…nt failed

1. To bypass exceptions when creating clients failed in CelebornShuffleReader in spark 3.
2. Client will try the location's replicas in reading locations.

Allow clients to retry locations when creating clients failed.

NO.

Pass GA.

Closes #2854 from FMX/b1671.

Authored-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
(cherry picked from commit 7dcd259)
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants