[jira] [Created] (SPARK-39821) DatetimeIndex error during pyspark session

bo zhao (Jira) Tue, 19 Jul 2022 23:22:05 -0700

bo zhao created SPARK-39821:
-------------------------------

             Summary: DatetimeIndex error during pyspark session
                 Key: SPARK-39821
                 URL: https://issues.apache.org/jira/browse/SPARK-39821
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 3.2.2
         Environment: OS: ubuntu


Python version: 3.8.13
            Reporter: bo zhao


{code:java}
Using Python version 3.8.13 (default, Jun 29 2022 11:50:19)
Spark context Web UI available at http://172.25.179.45:4042
Spark context available as 'sc' (master = local[*], app id = 
local-1658283215853).
SparkSession available as 'spark'.
>>> from pyspark import pandas as ps
WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is 
required to set this environment variable to '1' in both driver and executor 
sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it 
does not work if there is a Spark context already launched.
>>> ps.DatetimeIndex(['1970-01-01', '1970-01-01', '1970-01-01'])
/home/spark/spark/python/pyspark/pandas/internal.py:1573: FutureWarning: 
iteritems is deprecated and will be removed in a future version. Use .items 
instead.
  fields = [
/home/spark/spark/python/pyspark/sql/pandas/conversion.py:486: FutureWarning: 
iteritems is deprecated and will be removed in a future version. Use .items 
instead.
  for column, series in pdf.iteritems():
/home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601:
 FutureWarning: iteritems is deprecated and will be removed in a future 
version. Use .items instead.
  for item in s.iteritems():
/home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601:
 FutureWarning: iteritems is deprecated and will be removed in a future 
version. Use .items instead.
  for item in s.iteritems():
/home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601:
 FutureWarning: iteritems is deprecated and will be removed in a future 
version. Use .items instead.
  for item in s.iteritems():
/home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601:
 FutureWarning: iteritems is deprecated and will be removed in a future 
version. Use .items instead.
  for item in s.iteritems():
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/spark/spark/python/pyspark/pandas/indexes/base.py", line 2770, in 
__repr__
    pindex = 
self._psdf._get_or_create_repr_pandas_cache(max_display_count).index
  File "/home/spark/spark/python/pyspark/pandas/frame.py", line 12780, in 
_get_or_create_repr_pandas_cache
    self, "_repr_pandas_cache", {n: self.head(n + 1)._to_internal_pandas()}
  File "/home/spark/spark/python/pyspark/pandas/frame.py", line 12775, in 
_to_internal_pandas
    return self._internal.to_pandas_frame
  File "/home/spark/spark/python/pyspark/pandas/utils.py", line 589, in 
wrapped_lazy_property
    setattr(self, attr_name, fn(self))
  File "/home/spark/spark/python/pyspark/pandas/internal.py", line 1056, in 
to_pandas_frame
    pdf = sdf.toPandas()
  File "/home/spark/spark/python/pyspark/sql/pandas/conversion.py", line 248, 
in toPandas
    series = series.astype(t, copy=False)
  File "/home/spark/upstream/pandas/pandas/core/generic.py", line 6095, in 
astype
    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
  File "/home/spark/upstream/pandas/pandas/core/internals/managers.py", line 
386, in astype
    return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
  File "/home/spark/upstream/pandas/pandas/core/internals/managers.py", line 
308, in apply
    applied = getattr(b, f)(**kwargs)
  File "/home/spark/upstream/pandas/pandas/core/internals/blocks.py", line 526, 
in astype
    new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
  File "/home/spark/upstream/pandas/pandas/core/dtypes/astype.py", line 299, in 
astype_array_safe
    new_values = astype_array(values, dtype, copy=copy)
  File "/home/spark/upstream/pandas/pandas/core/dtypes/astype.py", line 227, in 
astype_array
    values = values.astype(dtype, copy=copy)
  File "/home/spark/upstream/pandas/pandas/core/arrays/datetimes.py", line 631, 
in astype
    return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy)
  File "/home/spark/upstream/pandas/pandas/core/arrays/datetimelike.py", line 
504, in astype
    raise TypeError(msg)
TypeError: Cannot cast DatetimeArray to dtype datetime64
 {code}
I exec pyspark, and insert the ps.DatetimeIndex(['1970-01-01', '1970-01-01', 
'1970-01-01']) in the session.

But it don't raise error like below
{code:java}
a = ps.DatetimeIndex(['1970-01-01', '1970-01-01', '1970-01-01']) {code}
It will raise error when I call a in the session, such as
{code:java}
>>> a
{code}
So, it would be in trouch in the __repr__ function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-39821) DatetimeIndex error during pyspark session

Reply via email to