Skip to content

failed to map column projection- incompatible data types list field element vs item #31

@AlJohri

Description

@AlJohri

I have a table that reads correctly using Spark + Delta Lake Libraries, but I'm having trouble reading via pv.

do you know which downstream dependency could be giving me this error?

Error: ArrowError(ExternalError(Execution("Failed to map column projection for field mycolumn. Incompatible data types List(Field { name: "element", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }) and List(Field { name: "item", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None })")))

I checked the schema from the delta transaction log and didn't see a hardcoded item or element:

❯ aws s3 cp s3://mybucket/year=2022/month=6/day=9/myprefix/_delta_log/00000000000000000000.json - | head -n 3 | tail -n 1 | jq '.metaData.schemaString | fromjson | .fields[] | select(.name == "mycolumn")'
{
  "name": "mycolumn",
  "type": {
    "type": "array",
    "elementType": "string",
    "containsNull": true
  },
  "nullable": true,
  "metadata": {}
}

When I look at the schema of a sample parquet file on s3, I do indeed see that the item in the list is called element:

pqrs schema =(s5cmd cat s3://mybucket/year=2022/month=6/day=9/myprefix/_partition=00001/part-00037-cb2e71c3-4f26-4de0-9e9a-18298489ccdc.c000.snappy.parquet)

...
message spark_schema {
  ...
  OPTIONAL group mycolumn (LIST) {
    REPEATED group list {
      OPTIONAL BYTE_ARRAY element (UTF8);
    }
  }
  ...
}

I see this exact error is from here: https://github.com/apache/arrow-datafusion/blob/aad82fbb32dc1bb4d03e8b36297f8c9a3148df89/datafusion/core/src/physical_plan/file_format/mod.rs#L253

And I also see that element is hardcoded in delta-rs here:

https://github.com/delta-io/delta-rs/blob/83b8296fa5d55ebe050b022ed583dc57152221fe/rust/src/delta_arrow.rs#L38-L48 (pr: delta-io/delta-rs#228)

But I can't seem to find where the schema mismatch is coming from.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions