Add minimal implementation of ingesting Parquet and CSV files#327
Conversation
|
Thanks @voonhous I have two comments
I think it would make more sense if you extended the existing At the What do you think? |
|
Also, would you mind updating the title of the PR to something more descriptive? We use the PRs as items in our change logs. |
|
/retest |
|
/test test-core-and-ingestion |
|
I have created this code snippet which creates a testing dataframe. It has all the types besides lists of boolean. I think we should ensure that we can ingest this as both a parquet and a pandas dataframe. https://gist.github.com/woop/d074ded542bc2b6ec5a0b5a96c72e9ab |
|
/retest |
1 similar comment
|
/retest |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: voonhous, woop The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This PR tries to follow minimal implementation to ingest a parquet file using PyArrow.
This is achieved by reading a Parquet file using PyArrow, batching it into RecordBatches before ingesting it with the existing code.
my_file.parquet → PyArrow Table → RecordBatches → Dataframe → FeatureRows (existing code) → Stream
Other modifications: