bo zhao created SPARK-39821: ------------------------------- Summary: DatetimeIndex error during pyspark session Key: SPARK-39821 URL: https://issues.apache.org/jira/browse/SPARK-39821 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.2.2 Environment: OS: ubuntu
Python version: 3.8.13 Reporter: bo zhao {code:java} Using Python version 3.8.13 (default, Jun 29 2022 11:50:19) Spark context Web UI available at http://172.25.179.45:4042 Spark context available as 'sc' (master = local[*], app id = local-1658283215853). SparkSession available as 'spark'. >>> from pyspark import pandas as ps WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is required to set this environment variable to '1' in both driver and executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it does not work if there is a Spark context already launched. >>> ps.DatetimeIndex(['1970-01-01', '1970-01-01', '1970-01-01']) /home/spark/spark/python/pyspark/pandas/internal.py:1573: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. fields = [ /home/spark/spark/python/pyspark/sql/pandas/conversion.py:486: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. for column, series in pdf.iteritems(): /home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. for item in s.iteritems(): /home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. for item in s.iteritems(): /home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. for item in s.iteritems(): /home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. for item in s.iteritems(): Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/spark/spark/python/pyspark/pandas/indexes/base.py", line 2770, in __repr__ pindex = self._psdf._get_or_create_repr_pandas_cache(max_display_count).index File "/home/spark/spark/python/pyspark/pandas/frame.py", line 12780, in _get_or_create_repr_pandas_cache self, "_repr_pandas_cache", {n: self.head(n + 1)._to_internal_pandas()} File "/home/spark/spark/python/pyspark/pandas/frame.py", line 12775, in _to_internal_pandas return self._internal.to_pandas_frame File "/home/spark/spark/python/pyspark/pandas/utils.py", line 589, in wrapped_lazy_property setattr(self, attr_name, fn(self)) File "/home/spark/spark/python/pyspark/pandas/internal.py", line 1056, in to_pandas_frame pdf = sdf.toPandas() File "/home/spark/spark/python/pyspark/sql/pandas/conversion.py", line 248, in toPandas series = series.astype(t, copy=False) File "/home/spark/upstream/pandas/pandas/core/generic.py", line 6095, in astype new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors) File "/home/spark/upstream/pandas/pandas/core/internals/managers.py", line 386, in astype return self.apply("astype", dtype=dtype, copy=copy, errors=errors) File "/home/spark/upstream/pandas/pandas/core/internals/managers.py", line 308, in apply applied = getattr(b, f)(**kwargs) File "/home/spark/upstream/pandas/pandas/core/internals/blocks.py", line 526, in astype new_values = astype_array_safe(values, dtype, copy=copy, errors=errors) File "/home/spark/upstream/pandas/pandas/core/dtypes/astype.py", line 299, in astype_array_safe new_values = astype_array(values, dtype, copy=copy) File "/home/spark/upstream/pandas/pandas/core/dtypes/astype.py", line 227, in astype_array values = values.astype(dtype, copy=copy) File "/home/spark/upstream/pandas/pandas/core/arrays/datetimes.py", line 631, in astype return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy) File "/home/spark/upstream/pandas/pandas/core/arrays/datetimelike.py", line 504, in astype raise TypeError(msg) TypeError: Cannot cast DatetimeArray to dtype datetime64 {code} I exec pyspark, and insert the ps.DatetimeIndex(['1970-01-01', '1970-01-01', '1970-01-01']) in the session. But it don't raise error like below {code:java} a = ps.DatetimeIndex(['1970-01-01', '1970-01-01', '1970-01-01']) {code} It will raise error when I call a in the session, such as {code:java} >>> a {code} So, it would be in trouch in the __repr__ function. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org