Skip to content

Conversation

@hamilton-earthscope
Copy link
Contributor

Rationale for this change

This PR implements the same allocation improvements as #557 but for the Take list kernel.

What changes are included in this PR?

  • Pre-allocate upfront with a better estimate of the necessary buffer size to eliminate repeated reallocations.
  • Use exponential growth for additional allocations for O(log n) total reallocations.

Are these changes tested?

New benchmarks added for take list kernel

Are there any user-facing changes?

No

Performance Comparison

on main

goos: darwin
goarch: arm64
pkg: github.com/apache/arrow-go/v18/arrow/compute
cpu: Apple M3 Max
BenchmarkTakeList/SmallBatch_ShortLists-16         	    2320	     555889 ns/op	   1798923 rows/sec	     3355390 B/op	     396 allocs/op
BenchmarkTakeList/MediumBatch_ShortLists-16        	      31	   37600434 ns/op	    265955 rows/sec	   324238694 B/op	    3214 allocs/op
BenchmarkTakeList/LargeBatch_ShortLists-16         	       2	  653682896 ns/op	     76490 rows/sec	  7877124660 B/op	   15751 allocs/op
BenchmarkTakeList/XLargeBatch_ShortLists-16        	       1	 2019873959 ns/op	     49508 rows/sec	 31380868976 B/op	   31489 allocs/op
BenchmarkTakeList/SmallBatch_MediumLists-16        	     334	    3435205 ns/op	    291104 rows/sec	    43059334 B/op	    1086 allocs/op
BenchmarkTakeList/MediumBatch_MediumLists-16       	       5	  230308533 ns/op	     43420 rows/sec	  4041679715 B/op	   10098 allocs/op
BenchmarkTakeList/LargeBatch_MediumLists-16        	       1	 4600489959 ns/op	     10868 rows/sec	100213267960 B/op	   50328 allocs/op
BenchmarkTakeList/XLargeBatch_MediumLists-16       	       1	16462343792 ns/op	      6074 rows/sec	400427784144 B/op	  101032 allocs/op
BenchmarkTakeList/LargeBatch_ShortLists_Large-16   	       1	 1682293042 ns/op	     29721 rows/sec	 31379855376 B/op	   31422 allocs/op
BenchmarkTakeList/XLargeBatch_MediumLists_Large-16 	       1	33800353917 ns/op	      2959 rows/sec	800433389776 B/op	  102480 allocs/op
BenchmarkTakeListPartitionPattern-16               	       1	 3003105000 ns/op	     16649 rows/sec	 69581474288 B/op	   46479 allocs/op
PASS
ok  	github.com/apache/arrow-go/v18/arrow/compute	71.015s

on this branch

goos: darwin
goarch: arm64
pkg: github.com/apache/arrow-go/v18/arrow/compute
cpu: Apple M3 Max
BenchmarkTakeList/SmallBatch_ShortLists-16         	   25522	     46138 ns/op	  21674062 rows/sec	   50668 B/op	      83 allocs/op
BenchmarkTakeList/MediumBatch_ShortLists-16        	    3792	    316046 ns/op	  31641022 rows/sec	  457558 B/op	      83 allocs/op
BenchmarkTakeList/LargeBatch_ShortLists-16         	     804	   1521240 ns/op	  32867968 rows/sec	 2232578 B/op	      84 allocs/op
BenchmarkTakeList/XLargeBatch_ShortLists-16        	     416	   2832247 ns/op	  35307705 rows/sec	 4435314 B/op	      84 allocs/op
BenchmarkTakeList/SmallBatch_MediumLists-16        	    9444	    125321 ns/op	   7979591 rows/sec	  173603 B/op	      83 allocs/op
BenchmarkTakeList/MediumBatch_MediumLists-16       	    1176	    999217 ns/op	  10007857 rows/sec	 1653926 B/op	      83 allocs/op
BenchmarkTakeList/LargeBatch_MediumLists-16        	     232	   4913249 ns/op	  10176590 rows/sec	 8229752 B/op	      85 allocs/op
BenchmarkTakeList/XLargeBatch_MediumLists-16       	     128	   9309120 ns/op	  10742230 rows/sec	16428821 B/op	      85 allocs/op
BenchmarkTakeList/LargeBatch_ShortLists_Large-16   	     739	   1560044 ns/op	  32050453 rows/sec	 3428837 B/op	      84 allocs/op
BenchmarkTakeList/XLargeBatch_MediumLists_Large-16 	     122	   9712600 ns/op	  10295969 rows/sec	24834215 B/op	      85 allocs/op
BenchmarkTakeListPartitionPattern-16               	      96	  11756706 ns/op	   4252901 rows/sec	18421151 B/op	      98 allocs/op
PASS
ok  	github.com/apache/arrow-go/v18/arrow/compute	17.869s

@hamilton-earthscope
Copy link
Contributor Author

Improvement from iceberg writes with list columns:


ListPrimitive Schema (list<string>)

Records Throughput Before Throughput After Improvement
100K 366K rec/sec 451K rec/sec +23%
500K 475K rec/sec 1.09M rec/sec +130%
2.5M 224K rec/sec 1.52M rec/sec +577%

Memory Usage (2.5M records):

  • Before: 342 GB allocated
  • After: 14 GB allocated
  • Reduction: -96%

ListStruct Schema (list<struct<...>>)

Records Throughput Before Throughput After Improvement
100K 247K rec/sec 359K rec/sec +45%
500K 195K rec/sec 594K rec/sec +205%
2.5M 74K rec/sec 729K rec/sec +883%

Memory Usage (2.5M records):

  • Before: 1,100 GB allocated
  • After: 19.7 GB allocated
  • Reduction: -98%

Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me!! Thanks much for this! Will merge assuming no issues in CI

@zeroshade zeroshade merged commit caa859d into apache:main Nov 14, 2025
16 checks passed
zeroshade pushed a commit that referenced this pull request Dec 1, 2025
### Rationale for this change

Arrow Go is lacking a Take kernel for Map types which means from
`iceberg-go` we cannot write to partitioned Iceberg tables containing
columns with Map types.

### What changes are included in this PR?

- Adds a new Take kernel for Map types
  - the same allocation behavior from #557 and #573 is used

### Are these changes tested?

Yes.


### Are there any user-facing changes?

* Can now use Take on Arrow schemas with Map columns.
* Can now write to partitioned Iceberg tables using `arrow-go`
@hamilton-earthscope hamilton-earthscope deleted the perf-take-list branch December 5, 2025 04:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants