Add support for file paths for providing entity rows during batch retrieval#365
Add support for file paths for providing entity rows during batch retrieval#365voonhous wants to merge 10 commits into
Conversation
Switched order of branching in export_source_to_staging_location.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: voonhous The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@voonhous: The following tests failed, say
DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
| entity_rows["datetime"] = pd.DatetimeIndex( | ||
| entity_rows["datetime"] | ||
| ).tz_localize(None) | ||
| elif isinstance(entity_rows, str): |
There was a problem hiding this comment.
Should there be an else statement here?
Users should be able to provide large amounts of entity rows when retrieving batch features, but currently they are blocked by memory limits of pandas DataFrames.
Right now, for batch retrieval we already support Avro files as the format for sending entity rows, however this is only available on the Feast Serving API. The Python SDK hides this detail by doing
Entity_rows Pandas DF → .avro (local) → .avro (gcs) → BQ
This pull requests adds the ability for users to provide:
Examples:
[Pandas Dataframe]subfolder/entities.avro/data/subfolder/entities.avrogs://food-recsys/folder/customer_entity_rows.avrogs://food-recsys/folder/customer_entity_rows_*.avroWhile
datetimeandevent_timestampare used interchangeably, there needs to be standardization within the SDK on which to use.As of now:
datetimeis enforced in Pandas DataFrame.event_timestampis enforced in local Avro file