Skip to content

Core: Coalesce consecutive position deletes into range inserts (Iceberg V2)#16052

Draft
Baunsgaard wants to merge 1 commit intoapache:mainfrom
Baunsgaard:coalesce-deletes-range-consumer
Draft

Core: Coalesce consecutive position deletes into range inserts (Iceberg V2)#16052
Baunsgaard wants to merge 1 commit intoapache:mainfrom
Baunsgaard:coalesce-deletes-range-consumer

Conversation

@Baunsgaard
Copy link
Copy Markdown
Contributor

@Baunsgaard Baunsgaard commented Apr 19, 2026

Add PositionDeleteRangeConsumer, a stateless utility that processs consecutive positions into a single PositionDeleteIndex.delete(start, end) call instead of one delete(pos) per position.

This primarily benefits Iceberg V2 tables; V3 deletion vectors are loaded as a serialised RoaringBitmap and bypass this path.

To avoid overhead on inputs with no runs, the consumer sniffs the first SNIFF_SIZE = 256 positions and counts boundaries (pairs where pos[i] - pos[i-1] != 1). Above BOUNDARY_THRESHOLD_PERCENT = 30, the remaining stream use a per-position loop equivalent to the original behavior.

Scenario Baseline (ms) Consumer (ms) Speedup
FULL (contiguous) 75.83 12.15 6.24x
MEDIUM (~64 + gap) 77.44 14.88 5.20x
SHORT (~4 + gap) 77.04 54.45 1.41x
SPARSE_95 (95%) 76.72 25.91 2.96x
SPARSE_50 (50%) 77.75 78.93 0.98x
SPARSE_5 (5%) 84.31 86.13 0.98x
NONE (step=2) 77.29 78.55 0.98x

Each iteration inserts 5M positions into a fresh BitmapPositionDeleteIndex; 4 back-to-back passes of (30 warmup + 51 measured iterations) in the same JVM; numbers are the average minimum-of-runs across passes 2-4.

@github-actions github-actions Bot added the core label Apr 19, 2026
@Baunsgaard Baunsgaard force-pushed the coalesce-deletes-range-consumer branch from 06105f9 to 8130781 Compare April 19, 2026 18:08
@Baunsgaard Baunsgaard marked this pull request as draft April 21, 2026 08:02
@Baunsgaard Baunsgaard force-pushed the coalesce-deletes-range-consumer branch 3 times, most recently from fa10273 to 24545db Compare April 22, 2026 15:18
Add PositionDeleteRangeConsumer that coalesces runs of consecutive
positions into a single delete(start, end) call, and use it from
Deletes.toPositionIndex() so sorted position delete files are inserted
into the bitmap as ranges instead of one position at a time.
@Baunsgaard Baunsgaard force-pushed the coalesce-deletes-range-consumer branch from 24545db to 8b75033 Compare April 23, 2026 23:06
@Baunsgaard Baunsgaard marked this pull request as ready for review April 23, 2026 23:12
Baunsgaard added a commit to Baunsgaard/iceberg-cpp that referenced this pull request May 6, 2026
Add ForEachPositionDelete, the C++ equivalent of Java's
PositionDeleteRangeConsumer. When DeleteLoader buffers matching
positions for a data file, it now hands them to ForEachPositionDelete
instead of calling PositionDeleteIndex::Delete(pos) per position.

The implementation sniffs the first 1024 positions to estimate boundary
density (fraction of adjacent pairs where pos[i] != pos[i-1] + 1) and
dispatches to one of two optimized paths:

  * Coalesce: walk the input emitting closed-interval runs to
    PositionDeleteIndex::Delete(start, end), letting CRoaring's
    addRange collapse contiguous runs.
  * Bulk addMany: group by high-32-bit key and flush each group via a
    private RoaringPositionBitmap::AddManyForKey hook over CRoaring's
    bulk addMany. Used when boundary density exceeds 10%.

Below 64 positions the sniff is skipped entirely; coalescing wins for
small inputs because the bulk path's fixed overhead (thread-local
buffer, per-key grouping) exceeds the work to be done.

Rework DeleteLoader::LoadPositionDelete to read Arrow batches via
nanoarrow ArrowArrayView directly instead of the row-oriented
ArrowArrayStructLike wrapper. The loader reuses the ArrowArrayView
across batches and exposes the int64 pos column as a raw buffer. When
the delete file's referenced_data_file matches the target data file
(Iceberg V2 writer hint, the common case), positions are passed to
ForEachPositionDelete as a zero-copy span directly from the Arrow
buffer. Otherwise a per-batch staging vector is used with
ArrowArrayViewGetStringUnsafe + memcmp to filter by path, still much
faster than the Scalar variant dispatch.

Locally measured results (GCC 14.3 Release, 5M positions microbench,
500K positions end-to-end):

  * ForEachPositionDelete microbench: 2.2x-10.6x over the per-position
    baseline across contiguous/run/sparse/alternating scenarios; no
    regressions on tiny inputs.
  * End-to-end LoadPositionDeletes (Parquet decode + Arrow iteration +
    bitmap inserts): 2.1x-2.5x across all scenarios.

Equivalent of the Java change in apache/iceberg#16052.
Baunsgaard added a commit to Baunsgaard/iceberg-cpp that referenced this pull request May 7, 2026
Add ForEachPositionDelete, the C++ equivalent of Java's
PositionDeleteRangeConsumer. When DeleteLoader buffers matching
positions for a data file, it now hands them to ForEachPositionDelete
instead of calling PositionDeleteIndex::Delete(pos) per position.

The implementation sniffs the first 1024 positions to estimate boundary
density (fraction of adjacent pairs where pos[i] != pos[i-1] + 1) and
dispatches to one of two optimized paths:

  * Coalesce: walk the input emitting closed-interval runs to
    PositionDeleteIndex::Delete(start, end), letting CRoaring's
    addRange collapse contiguous runs.
  * Bulk addMany: group by high-32-bit key and flush each group via a
    private RoaringPositionBitmap::AddManyForKey hook over CRoaring's
    bulk addMany. Used when boundary density exceeds 10%.

Below 64 positions the sniff is skipped entirely; coalescing wins for
small inputs because the bulk path's fixed overhead (thread-local
buffer, per-key grouping) exceeds the work to be done.

Rework DeleteLoader::LoadPositionDelete to read Arrow batches via
nanoarrow ArrowArrayView directly instead of the row-oriented
ArrowArrayStructLike wrapper. The loader reuses the ArrowArrayView
across batches and exposes the int64 pos column as a raw buffer. When
the delete file's referenced_data_file matches the target data file
(Iceberg V2 writer hint, the common case), positions are passed to
ForEachPositionDelete as a zero-copy span directly from the Arrow
buffer. Otherwise a per-batch staging vector is used with
ArrowArrayViewGetStringUnsafe + memcmp to filter by path, still much
faster than the Scalar variant dispatch.

Locally measured results (GCC 14.3 Release, 5M positions microbench,
500K positions end-to-end):

  * ForEachPositionDelete microbench: 2.2x-10.6x over the per-position
    baseline across contiguous/run/sparse/alternating scenarios; no
    regressions on tiny inputs.
  * End-to-end LoadPositionDeletes (Parquet decode + Arrow iteration +
    bitmap inserts): 2.1x-2.5x across all scenarios.

Equivalent of the Java change in apache/iceberg#16052.
Baunsgaard added a commit to Baunsgaard/iceberg-cpp that referenced this pull request May 8, 2026
Add ForEachPositionDelete (the C++ equivalent of Java's
PositionDeleteRangeConsumer) and route DeleteLoader through it instead
of calling PositionDeleteIndex::Delete(pos) per position. The function
sniffs a 1024-position prefix to estimate boundary density and
dispatches to one of two paths:

  * Coalesce: emit closed-interval runs to
    PositionDeleteIndex::Delete(start, end), letting CRoaring's
    addRange collapse contiguous runs.
  * Bulk addMany: group by high-32-bit key and flush each group via a
    private RoaringPositionBitmap::AddManyForKey hook over CRoaring's
    addMany. Used when boundary density exceeds 10%. Inputs below 64
    positions skip the sniff and use the coalesce path directly.

Also rework DeleteLoader::LoadPositionDelete to read Arrow batches via
nanoarrow's ArrowArrayView directly instead of the row-oriented
ArrowArrayStructLike wrapper. The view is reused across batches and
exposes the int64 pos column as a raw buffer. When the delete file's
referenced_data_file matches the target (Iceberg V2 writer hint, the
common case), positions are passed to ForEachPositionDelete as a
zero-copy span; otherwise a per-batch staging vector filters by path
using nanoarrow's unsafe direct-buffer accessors.

Local microbenchmarks (GCC 14.3 Release): 2.2x-10.6x speedup for
ForEachPositionDelete vs the per-position baseline, and 2.1x-2.5x for
the end-to-end loader (Parquet decode + Arrow iteration + bitmap
inserts).

Equivalent of apache/iceberg#16052.
@Baunsgaard Baunsgaard marked this pull request as draft May 8, 2026 12:07
Baunsgaard added a commit to Baunsgaard/iceberg-cpp that referenced this pull request May 8, 2026
Add ForEachPositionDelete (the C++ equivalent of Java's
PositionDeleteRangeConsumer) and route DeleteLoader through it,
replacing the per-position PositionDeleteIndex::Delete(pos) call. The
function sniffs a 1024-position prefix and dispatches to either run
coalescing (CRoaring addRange) or bulk addMany grouped by
high-32-bit key.

Also rework DeleteLoader::LoadPositionDelete to read Arrow batches via
nanoarrow's ArrowArrayView directly. When the delete file's
referenced_data_file matches the target (V2 writer hint), positions
are passed as a zero-copy span; otherwise a per-batch staging vector
filters by path.

Local microbenchmarks: 2.2x-10.6x for ForEachPositionDelete and
2.1x-2.5x end-to-end through LoadPositionDeletes. Equivalent of
apache/iceberg#16052.
Baunsgaard added a commit to Baunsgaard/iceberg-cpp that referenced this pull request May 11, 2026
Add ForEachPositionDelete (the C++ equivalent of Java's
PositionDeleteRangeConsumer) and route DeleteLoader through it,
replacing the per-position PositionDeleteIndex::Delete(pos) call. The
function sniffs a 1024-position prefix and dispatches to either run
coalescing (CRoaring addRange) or bulk addMany grouped by
high-32-bit key.

Also rework DeleteLoader::LoadPositionDelete to read Arrow batches via
nanoarrow's ArrowArrayView directly. When the delete file's
referenced_data_file matches the target (V2 writer hint), positions
are passed as a zero-copy span; otherwise a per-batch staging vector
filters by path.

Local microbenchmarks: 2.2x-10.6x for ForEachPositionDelete and
2.1x-2.5x end-to-end through LoadPositionDeletes. Equivalent of
apache/iceberg#16052.
Baunsgaard added a commit to Baunsgaard/iceberg-cpp that referenced this pull request May 11, 2026
Add ForEachPositionDelete (the C++ equivalent of Java's
PositionDeleteRangeConsumer) and route DeleteLoader through it,
replacing the per-position PositionDeleteIndex::Delete(pos) call. The
function sniffs a 1024-position prefix and dispatches to either run
coalescing (CRoaring addRange) or bulk addMany grouped by
high-32-bit key.

Also rework DeleteLoader::LoadPositionDelete to read Arrow batches via
nanoarrow's ArrowArrayView directly. When the delete file's
referenced_data_file matches the target (V2 writer hint), positions
are passed as a zero-copy span; otherwise a per-batch staging vector
filters by path.

Local microbenchmarks: 2.2x-10.6x for ForEachPositionDelete and
2.1x-2.5x end-to-end through LoadPositionDeletes. Equivalent of
apache/iceberg#16052.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant