Conversation
| { | ||
| "name": random.choice(["fred", "wilma", "barney", "betty"]), | ||
| "number": random.randint(0, 100), | ||
| "timestamp": datetime.now(timezone.utc) - timedelta(days=i % 2), |
There was a problem hiding this comment.
Before adding timezone.utc I was getting this assertion error:
E AssertionError: Attributes of DataFrame.iloc[:, 2] (column name="timestamp") are different
E
E Attribute "dtype" are different
E [left]: datetime64[ns, UTC]
E [right]: datetime64[ns]It seems like when reading back from bigquery, it will automatically convert to utc if not otherwise specified, causing the error.
@tswast can you confirm this is the case? any comments?
There was a problem hiding this comment.
TIMESTAMP columns are intended to come back as datetime64[ns, UTC], yes.
DATETIME should come back as datetime64[ns].
See my answer here on the difference between the two: https://stackoverflow.com/a/47724366/101923
Also note: both will come back as object dtype if there's a date outside of the pandas representable range, e.g. 0001-01-01 or 9999-12-31.
There was a problem hiding this comment.
I'm actually working on making the pandas-gbq dtypes consistent with google-cloud-bigquery as we speak in googleapis/python-bigquery-pandas#444
There was a problem hiding this comment.
It seems that if I don't provide a schema, bigquery will infer that the dataframe column named "timestamp" is a TIMESTAMP column therefore it's converting it is coming back as datetime64[ns, UTC]. That been said to keep the test simple I think we can have the local dataframe to be timezone aware and test that it comes back as it should.
cc: @jrbourbeau Does this convince you? If so this PR is ready for review.
|
It looks like when on macOS it can't find |
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks for fixing @ncclementi and reviewing @tswast! This is in
|
Forgot to mention the macOS CI failure. This looks like a totally unrelated packaging issue (we're seeing similar things over in Dask's CI). I'm not currently able to reproduce locally -- let me rerun CI to see if the issue has already been resolved |
|
Hmm unfortunately the macOS environment issue is still around. I'm highly confident this is unrelated to the changes in this PR (see similar things being reported in |
Replace use of pandas-gbq for pure Bigquery