[ https://issues.apache.org/jira/browse/SPARK-32147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149403#comment-17149403 ]
Lantao Jin commented on SPARK-32147: ------------------------------------ set spark.sql.sources.partitionColumnTypeInference.enabled to false will print the right values. > Spark: PartitionBy changing the columns value > ---------------------------------------------- > > Key: SPARK-32147 > URL: https://issues.apache.org/jira/browse/SPARK-32147 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell > Affects Versions: 3.0.0 > Reporter: Shankar Koirala > Priority: Major > Labels: spark > > While saving dataframe as parquet or csv with partitionBy column having 'f' > and 'd' with numbers are changing the values. > Below is the example > {code:java} > scala> val df = Seq( > | ("9q", 1), > | ("3k", 2), > | ("6f", 3), > | ("7f", 4), > | ("7d", 5) > | ).toDF("value", "id") > df: org.apache.spark.sql.DataFrame = [value: string, id: int] > scala> df.show(false) > +-----+---+ > |value|id | > +-----+---+ > | 9q | 1 | > | 3k | 2 | > | 6f | 3 | > | 7f | 4 | > | 7d | 5 | > +-----+---+ > scala> > df.write.partitionBy("value").mode(SaveMode.Overwrite).parquet("tmp_parquet") > scala> spark.read.parquet("tmp_parquet").show(false) > +---+-----+ > |id |value| > +---+-----+ > |5 | 7.0 | > |3 | 6.0 | > |2 | 3k | > |4 | 7.0 | > |1 | 9q | > +---+-----+ > {code} > Same with the other format too, Is this a bug or is it normal. > Taken from > [SO|[https://stackoverflow.com/questions/62671684/spark-incorrectly-intepret-partition-name-ending-with-d-or-f-as-number-when]] > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org