[ 
https://issues.apache.org/jira/browse/PARQUET-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue reassigned PARQUET-1510:
----------------------------------

    Assignee: Ryan Blue

> Dictionary filter skips null values when evaluating not-equals.
> ---------------------------------------------------------------
>
>                 Key: PARQUET-1510
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1510
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.9.0, 1.10.0, 1.9.1
>            Reporter: Ryan Blue
>            Assignee: Ryan Blue
>            Priority: Blocker
>              Labels: correctness, pull-request-available
>             Fix For: 1.11.0, 1.10.1
>
>
> This was discovered in Spark, see SPARK-26677. From the Spark PR:
> {code}
> // Repeat the values to get dictionary encoding.
> Seq(Some("A"), Some("A"), 
> None).toDF.repartition(1).write.mode("overwrite").parquet("/tmp/foo")
> spark.read.parquet("/tmp/foo").where("NOT (value <=> 'A')").show()
> +-----+
> |value|
> +-----+
> +-----+
> {code}
> {code}
> // Use plain encoding.
> Seq(Some("A"), 
> None).toDF.repartition(1).write.mode("overwrite").parquet("/tmp/bar")
> spark.read.parquet("/tmp/bar").where("NOT (value <=> 'A')").show()
> +-----+
> |value|
> +-----+
> | null|
> +-----+
> {code}
> This is a correctness issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to