[ https://issues.apache.org/jira/browse/SPARK-34292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-34292. ---------------------------------- Resolution: Duplicate > NOW is interpreted as the NOW SQL function > ------------------------------------------ > > Key: SPARK-34292 > URL: https://issues.apache.org/jira/browse/SPARK-34292 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core > Affects Versions: 3.0.0 > Reporter: Gaelan Mines > Priority: Major > > I think we ran into a bug in the Spark framework. Basically, the bug we > caught is like this: when reading a data frame in Parquet format partitioned > by a column, if the column contains values of “NOW”, NOW will be interpreted > as the NOW function as in SQL, and returns the literal timestamp of NOW. > > Steps to reproduce: > from pyspark.sql.session import SparkSession > spark = SparkSession.builder.getOrCreate() > df = spark.createDataFrame([['NOW', 1], ['THEN', 2]], schema=['Col1', 'Col2']) > df.write.parquet('/tmp/my_partitioned_data', mode='overwrite', > partitionBy=['Col1']) > df_read_back = spark.read.parquet('/tmp/my_partitioned_data') > """ > In [1]: df.show() > +----+----+ > |Col1|Col2| > +----+----+ > | NOW| 1| > |THEN| 2| > +----+----+ > In [2]: df_read_back.show() > +----+--------------------+ > |Col2| Col1| > +----+--------------------+ > | 1|2021-01-22 10:46:...| > | 2| THEN| > +----+--------------------+ -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org