[ https://issues.apache.org/jira/browse/SPARK-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gerhard Fiedler updated SPARK-12683: ------------------------------------ Attachment: spark_bug_date.py The code from the description is attached as spark_bug_date.py. > SQL timestamp is wrong when accessed as Python datetime > ------------------------------------------------------- > > Key: SPARK-12683 > URL: https://issues.apache.org/jira/browse/SPARK-12683 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.5.1, 1.5.2, 1.6.0 > Environment: Windows 7 Pro x64 > Python 3.4.3 > py4j 0.9 > Reporter: Gerhard Fiedler > Attachments: spark_bug_date.py > > > When accessing SQL timestamp data through {{.show()}}, it looks correct, but > when accessing it (as Python {{datetime}}) through {{.collect()}}, it is > wrong. > {code} > from datetime import datetime > from pyspark import SparkContext > from pyspark.sql import SQLContext > if __name__ == "__main__": > spark_context = SparkContext(appName='SparkBugTimestampHour') > sql_context = SQLContext(spark_context) > sql_text = """select cast('2100-09-09 12:11:10.09' as timestamp) as ts""" > data_frame = sql_context.sql(sql_text) > data_frame.show(truncate=False) > # Result from .show() (as expected, looks correct): > # +----------------------+ > # |ts | > # +----------------------+ > # |2100-09-09 12:11:10.09| > # +----------------------+ > rows = data_frame.collect() > row = rows[0] > ts = row[0] > print('ts={ts}'.format(ts=ts)) > # Expected result from this print statement: > # ts=2100-09-09 12:11:10.090000 > # > # Actual, wrong result (note the hours being 18 instead of 12): > # ts=2100-09-09 18:11:10.090000 > # > # This error seems to be dependent on some characteristic of the system. > We couldn't reproduce > # this on all of our systems, but it is not clear what the differences > are. One difference is > # the processor: it failed on Intel Xeon E5-2687W v2. > assert isinstance(ts, datetime) > assert ts.year == 2100 and ts.month == 9 and ts.day == 9 > assert ts.minute == 11 and ts.second == 10 and ts.microsecond == 90000 > if ts.hour != 12: > print('hour is not correct; should be 12, is actually > {hour}'.format(hour=ts.hour)) > spark_context.stop() > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org