Skip to content

[Parquet][C++] PageIndex is useless with current API #45284

@mpoeter

Description

@mpoeter

Describe the enhancement requested

The ParquetFileReader provides a PageIndexReader via which we can eventually get to a ColumnIndex and an OffsetIndex - so far so good. Those indexes provide page based information, but in virtually all APIs the concept of pages is completely abstracted away. For higher level APIs that makes sense, but even if we go down to the level of the PageReader we can only read all pages serially one after the other. The only way I found to skip some pages is via the PageReader's data page filter, but that only operates on the page's metadata and does not utilize the index. I did not find a way to load a specific page (e.g.,via index or file offset). But then I don't see how one can utilize the PageIndex with the current API. Did I miss anything?

Component(s)

C++, Parquet

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions