Support for historical data validation causes error in CSV files without headers

The recent changes introduced in #1006 use `INTERSECT` to get the names of the columns that exist in both the data contract and the data files (parquet or CSV). However, when CSV files have no headers and the option [`names`](https://duckdb.org/docs/stable/data/csv/tips#provide-names-if-the-file-does-not-contain-a-header) is not used, DuckDB assigns default names. Except in very rare cases, an intersection between those default names and the data contract names will return an empty array, which causes the `INSERT` in the next step to throw an error (due to empty SELECT statement).

Apart from this error, these changes also introduced an inconsistency - the SQL object created by the `create_view_with_schema_union` function will either be a table (`if converted_types`) or a view (fallback). I suspect the use of a table instead of a view may have further implications in terms of performance, as data will automatically be loaded into memory.

Finally, I would raise the question of whether it makes sense to treat non-`required` fields as optional rather than _nullable_. According to ODCS, the `required` key "indicates if the element may contain Null values", not if it can be entirely absent from the data. Perhaps it would be useful to have historical data support as an option but I am not sure that it makes sense to have it as the default behavior.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for historical data validation causes error in CSV files without headers #1018

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for historical data validation causes error in CSV files without headers #1018

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions