[ 
https://issues.apache.org/jira/browse/SPARK-23360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated SPARK-23360:
---------------------------------
    Summary: SparkSession.createDataFrame timestamps can be incorrect with 
non-Arrow codepath  (was: SparkSession.createDataFrame results in correct 
results with non-Arrow codepath)

> SparkSession.createDataFrame timestamps can be incorrect with non-Arrow 
> codepath
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-23360
>                 URL: https://issues.apache.org/jira/browse/SPARK-23360
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.3.0
>            Reporter: Li Jin
>            Priority: Major
>
> {code:java}
> import datetime
> import pandas as pd
> import os
> dt = [datetime.datetime(2015, 10, 31, 22, 30)]
> pdf = pd.DataFrame({'time': dt})
> os.environ['TZ'] = 'America/New_York'
> df1 = spark.createDataFrame(pdf)
> df1.show()
> +-------------------+
> |               time|
> +-------------------+
> |2015-10-31 21:30:00|
> +-------------------+
> {code}
> Seems to related to this line here:
> [https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L1776]
> It appears to be an issue with "tzlocal()"
> Wrong:
> {code:java}
> from_tz = "America/New_York"
> to_tz = "tzlocal()"
> s.apply(lambda ts:  
> ts.tz_localize(from_tz,ambiguous=False).tz_convert(to_tz).tz_localize(None)
> if ts is not pd.NaT else pd.NaT)
> 0   2015-10-31 21:30:00
> Name: time, dtype: datetime64[ns]
> {code}
> Correct:
> {code:java}
> from_tz = "America/New_York"
> to_tz = "America/New_York"
> s.apply(
> lambda ts: ts.tz_localize(from_tz, 
> ambiguous=False).tz_convert(to_tz).tz_localize(None)
> if ts is not pd.NaT else pd.NaT)
> 0   2015-10-31 22:30:00
> Name: time, dtype: datetime64[ns]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to