-
Notifications
You must be signed in to change notification settings - Fork 749
Description
Summary
The GeoPackage V2 DataSource reader (GeoPackageTable) does not implement SupportsMetadataColumns, so queries like SELECT _metadata.file_name FROM geopackage... fail.
This is the same issue as SEDONA-729 (which tracks the shapefile reader), but for the GeoPackage reader.
Expected behavior
val df = spark.read.format("geopackage").load("/path/to/data.gpkg")
df.select("_metadata.file_path", "_metadata.file_name", "_metadata.file_size").show()The above query should return file-level metadata for each row.
Current behavior
The _metadata column is not available because GeoPackageTable extends FileTable but does not implement SupportsMetadataColumns.
Fix
GeoPackageTable should implement SupportsMetadataColumns and expose a _metadata column with the standard struct fields (file_path, file_name, file_size, file_block_start, file_block_length, file_modification_time), similar to the fix being applied for shapefiles in SEDONA-729.
The corresponding scan builder, scan, and partition reader factory will also need to be updated to propagate and populate the metadata schema.
Notes
- The OSM PBF reader is not affected because it uses the V1 DataSource API (
FileFormat), which gets_metadatasupport automatically from Spark.