[ https://issues.apache.org/jira/browse/PARQUET-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ryan Blue reassigned PARQUET-1510: ---------------------------------- Assignee: Ryan Blue > Dictionary filter skips null values when evaluating not-equals. > --------------------------------------------------------------- > > Key: PARQUET-1510 > URL: https://issues.apache.org/jira/browse/PARQUET-1510 > Project: Parquet > Issue Type: Bug > Components: parquet-mr > Affects Versions: 1.9.0, 1.10.0, 1.9.1 > Reporter: Ryan Blue > Assignee: Ryan Blue > Priority: Blocker > Labels: correctness, pull-request-available > Fix For: 1.11.0, 1.10.1 > > > This was discovered in Spark, see SPARK-26677. From the Spark PR: > {code} > // Repeat the values to get dictionary encoding. > Seq(Some("A"), Some("A"), > None).toDF.repartition(1).write.mode("overwrite").parquet("/tmp/foo") > spark.read.parquet("/tmp/foo").where("NOT (value <=> 'A')").show() > +-----+ > |value| > +-----+ > +-----+ > {code} > {code} > // Use plain encoding. > Seq(Some("A"), > None).toDF.repartition(1).write.mode("overwrite").parquet("/tmp/bar") > spark.read.parquet("/tmp/bar").where("NOT (value <=> 'A')").show() > +-----+ > |value| > +-----+ > | null| > +-----+ > {code} > This is a correctness issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)