Skip to content

Improve performance of elementwise ByteViewArray concatenation#10161

Open
pepijnve wants to merge 1 commit into
apache:mainfrom
pepijnve:concat_view
Open

Improve performance of elementwise ByteViewArray concatenation#10161
pepijnve wants to merge 1 commit into
apache:mainfrom
pepijnve:concat_view

Conversation

@pepijnve

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

None

Rationale for this change

During profiling of a DataFusion query String concatenation, in particular of two StringView arrays, proved to be a hotspot.
This MR proposes a revised version of concat_elements_string_view_array which eliminates some overhead that comes from using a fairly generic implementation strategy.
Benchmarking shows improvement of 20-40%.

What changes are included in this PR?

  • Replace StringViewBuilder based concatenation implementation with one that directly writes the various buffers of the array

Are these changes tested?

  • Covered by existing tests, and some additional test cases added to ensure newly added code is covered

Are there any user-facing changes?

No

@github-actions github-actions Bot added the arrow Changes to the arrow crate label Jun 20, 2026
@pepijnve

Copy link
Copy Markdown
Contributor Author

I'll fix the MSRV issue as soon as I can.

@pepijnve

Copy link
Copy Markdown
Contributor Author

@neilconway thought you might find this one interesting as well. I'm thinking of making a similar PR for the concat functions in DataFusion. The semantics wrt null handling are different compared to ||, but I think the same optimisations will apply there as well.

@pepijnve pepijnve force-pushed the concat_view branch 4 times, most recently from e718dee to 8b2a515 Compare June 20, 2026 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant