[ https://issues.apache.org/jira/browse/SPARK-23290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-23290. ---------------------------------- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 20515 [https://github.com/apache/spark/pull/20515] > inadvertent change in handling of DateType when converting to pandas dataframe > ------------------------------------------------------------------------------ > > Key: SPARK-23290 > URL: https://issues.apache.org/jira/browse/SPARK-23290 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.3.0 > Reporter: Andre Menck > Assignee: Takuya Ueshin > Priority: Blocker > Fix For: 2.3.0 > > > In [this > PR|https://github.com/apache/spark/pull/18664/files#diff-6fc344560230bf0ef711bb9b5573f1faR1968] > there was a change in how `DateType` is being returned to users (line 1968 > in dataframe.py). This can cause client code to fail, as in the following > example from a python terminal: > {code:python} > >>> pdf = pd.DataFrame([['2015-01-01',1]], columns=['date', 'num']) > >>> pdf.dtypes > date object > num int64 > dtype: object > >>> pdf['date'].apply(lambda d: dt.datetime.strptime(d, '%Y-%m-%d').date() ) > 0 2015-01-01 > Name: date, dtype: object > >>> pdf = pd.DataFrame([['2015-01-01',1]], columns=['date', 'num']) > >>> pdf.dtypes > date object > num int64 > dtype: object > >>> pdf['date'] = pd.to_datetime(pdf['date']) > >>> pdf.dtypes > date datetime64[ns] > num int64 > dtype: object > >>> pdf['date'].apply(lambda d: dt.datetime.strptime(d, '%Y-%m-%d').date() ) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File > "/Users/amenck/anaconda2/lib/python2.7/site-packages/pandas/core/series.py", > line 2355, in apply > mapped = lib.map_infer(values, f, convert=convert_dtype) > File "pandas/_libs/src/inference.pyx", line 1574, in > pandas._libs.lib.map_infer > File "<stdin>", line 1, in <lambda> > TypeError: strptime() argument 1 must be string, not Timestamp > >>> > {code} > Above we show both the old behavior (returning an "object" col) and the new > behavior (returning a datetime column). Since there may be user code relying > on the old behavior, I'd suggest reverting this specific part of this change. > Also note that the NOTE on the docstring for the "_to_corrected_pandas_type" > seems to be off, referring to the old behavior and not the current one. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org