Thomas Li created ARROW-13471: --------------------------------- Summary: [Python][Parquet]Pandas datetime columns not correctly roundtripping with fastparquet(0.7.0) and pyarrow Key: ARROW-13471 URL: https://issues.apache.org/jira/browse/ARROW-13471 Project: Apache Arrow Issue Type: Bug Components: Parquet, Python Affects Versions: 4.0.1 Environment: pandas: 1.4.0.dev0+253.gedd5af779a.dirty pyarrow: 4.0.1 fastparquet: 0.7.0 Reporter: Thomas Li
When trying to roundtrip data with pandas.read_parquet, datetime64[ns] columns are not round-tripped correctly if the data is written with fastparquet and read in with pyarrow. The data appears to be read in correctly, but the dtypes are incorrect. Note: This works correctly if the engine used to read and write is fastparquet. I asked this on the fastparquet bug tracker and they said that it was a pyarrow bug. xref [Broken compat between fastparquet(0.7.0) and pyarrow · Issue #650 · dask/fastparquet (github.com)|https://github.com/dask/fastparquet/issues/650] {code:java} import pandas as pd s = pd.DataFrame({"a":pd.date_range("20130101", periods=3)}) s.dtypes # datetime64[ns] s.to_parquet("test.parquet", engine="fastparquet") pd.read_parquet("test.parquet", engine="pyarrow").dtypes # datetime64[ns, UTC] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)