Shankar Koirala created SPARK-32147:
---------------------------------------

             Summary: Spark: PartitionBy changing the columns value 
                 Key: SPARK-32147
                 URL: https://issues.apache.org/jira/browse/SPARK-32147
             Project: Spark
          Issue Type: Bug
          Components: Spark Core, Spark Shell
    Affects Versions: 3.0.0
            Reporter: Shankar Koirala


While saving dataframe as parquet or csv with partitionBy column having 'f' and 
'd' with numbers are changing the values. Below is the example 
{code:java}
scala> val df = Seq(
 | ("9q", 1),
 | ("3k", 2),
 | ("6f", 3),
 | ("7f", 4),
 | ("7d", 5)
 | ).toDF("value", "id")
df: org.apache.spark.sql.DataFrame = [value: string, id: int]
scala> df.show(false)
+-----+---+
|value|id |
+-----+---+
|  9q | 1 |
|  3k | 2 |
|  6f | 3 |
|  7f | 4 |
|  7d | 5 |
+-----+---+

scala> 
df.write.partitionBy("value").mode(SaveMode.Overwrite).parquet("tmp_parquet")
scala> spark.read.parquet("tmp_parquet").show(false)
+---+-----+
|id |value|
+---+-----+
|5  | 7.0 |
|3  | 6.0 |
|2  | 3k  |
|4  | 7.0 |
|1  | 9q  |
+---+-----+

{code}
Same with the other format too, Is this a bug or is it normal.

Taken from 
[SO|[https://stackoverflow.com/questions/62671684/spark-incorrectly-intepret-partition-name-ending-with-d-or-f-as-number-when]]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to