[jira] [Comment Edited] (SPARK-24438) Empty strings and null strings are written to the same partition

Marco Gaido (JIRA) Mon, 09 Jul 2018 01:38:34 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-24438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16536682#comment-16536682
 ]


Marco Gaido edited comment on SPARK-24438 at 7/9/18 8:37 AM:
-------------------------------------------------------------

IIRC, Hive has a placeholder string (\_\_HIVE_DEFAULT_PARTITION\_\_) for null 
value in partitions.


was (Author: mgaido):
IIRC, Hive has a placeholder string (__HIVE_DEFAULT_PARTITION__) for null value 
in partitions.

> Empty strings and null strings are written to the same partition
> ----------------------------------------------------------------
>
>                 Key: SPARK-24438
>                 URL: https://issues.apache.org/jira/browse/SPARK-24438
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Mukul Murthy
>            Priority: Major
>
> When you partition on a string column that has empty strings and nulls, they 
> are both written to the same default partition. When you read the data back, 
> all those values get read back as null.
> {code:java}
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.catalyst.encoders.RowEncoder
> val data = Seq(Row(1, ""), Row(2, ""), Row(3, ""), Row(4, "hello"), Row(5, 
> null))
> val schema = new StructType().add("a", IntegerType).add("b", StringType)
> val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
> display(df) 
> => 
> a b
> 1 
> 2 
> 3 
> 4 hello
> 5 null
> df.write.mode("overwrite").partitionBy("b").save("/home/mukul/weird_test_data4")
> val df2 = spark.read.load("/home/mukul/weird_test_data4")
> display(df2)
> => 
> a b
> 4 hello
> 3 null
> 2 null
> 1 null
> 5 null
> {code}
> Seems to affect multiple types of tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-24438) Empty strings and null strings are written to the same partition

Reply via email to