Joris Van den Bossche created ARROW-18124: ---------------------------------------------
Summary: [Python] Support converting to non-nano datetime64 for pandas >= 2.0 Key: ARROW-18124 URL: https://issues.apache.org/jira/browse/ARROW-18124 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Joris Van den Bossche Fix For: 11.0.0 Pandas is adding capabilities to store non-nanosecond datetime64 data. At the moment, we however always do convert to nanosecond, regardless of the timestamp resolution of the arrow table (and regardless of the pandas metadata). Using the development version of pandas: {code} In [1]: df = pd.DataFrame({"col": np.arange("2012-01-01", 10, dtype="datetime64[s]")}) In [2]: df.dtypes Out[2]: col datetime64[s] dtype: object In [3]: table = pa.table(df) In [4]: table.schema Out[4]: col: timestamp[s] -- schema metadata -- pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 423 In [6]: table.to_pandas().dtypes Out[6]: col datetime64[ns] dtype: object {code} This is because we have a {{coerce_temporal_nanoseconds}} conversion option which we hardcode to True (for top-level columns, we hardcode it to False for nested data). When users have pandas >= 2, we should support converting with preserving the resolution. We should certainly do so if the pandas metadata indicates which resolution was originally used (to ensure correct roundtrip). We _could_ (and at some point also _should_) also do that by default if there is no pandas metadata (but maybe only later depending on how stable this new feature is in pandas, as it is potentially a breaking change for our users if you use eg pyarrow to read a parquet file). -- This message was sent by Atlassian Jira (v8.20.10#820010)