Skip to content

GeoPackage reader does not support _metadata hidden column #2651

@jiayuasu

Description

@jiayuasu

Summary

The GeoPackage V2 DataSource reader (GeoPackageTable) does not implement SupportsMetadataColumns, so queries like SELECT _metadata.file_name FROM geopackage... fail.

This is the same issue as SEDONA-729 (which tracks the shapefile reader), but for the GeoPackage reader.

Expected behavior

val df = spark.read.format("geopackage").load("/path/to/data.gpkg")
df.select("_metadata.file_path", "_metadata.file_name", "_metadata.file_size").show()

The above query should return file-level metadata for each row.

Current behavior

The _metadata column is not available because GeoPackageTable extends FileTable but does not implement SupportsMetadataColumns.

Fix

GeoPackageTable should implement SupportsMetadataColumns and expose a _metadata column with the standard struct fields (file_path, file_name, file_size, file_block_start, file_block_length, file_modification_time), similar to the fix being applied for shapefiles in SEDONA-729.

The corresponding scan builder, scan, and partition reader factory will also need to be updated to propagate and populate the metadata schema.

Notes

  • The OSM PBF reader is not affected because it uses the V1 DataSource API (FileFormat), which gets _metadata support automatically from Spark.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions