[ https://issues.apache.org/jira/browse/SPARK-43413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-43413. --------------------------------- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41094 [https://github.com/apache/spark/pull/41094] > IN subquery ListQuery has wrong nullability > ------------------------------------------- > > Key: SPARK-43413 > URL: https://issues.apache.org/jira/browse/SPARK-43413 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.4.0 > Reporter: Jack Chen > Assignee: Jack Chen > Priority: Major > Fix For: 3.5.0 > > > IN subquery expressions are incorrectly marked as non-nullable, even when > they are actually nullable. They correctly check the nullability of the > left-hand-side, but the right-hand-side of a IN subquery, the ListQuery, is > currently defined with nullability = false always. This is incorrect and can > lead to incorrect query transformations. > Example: (non_nullable_col IN (select nullable_col)) <=> TRUE . Here the IN > expression returns NULL when the nullable_col is null, but our code marks it > as non-nullable, and therefore SimplifyBinaryComparison transforms away the > <=> TRUE, transforming the expression to non_nullable_col IN (select > nullable_col) , which is an incorrect transformation because NULL values of > nullable_col now cause the expression to yield NULL instead of FALSE. > This bug can potentially lead to wrong results, but in most cases this > doesn't directly cause wrong results end-to-end, because IN subqueries are > almost always transformed to semi/anti/existence joins in > RewritePredicateSubquery, and this rewrite can also incorrectly discard > NULLs, which is another bug. But we can observe it causing wrong behavior in > unit tests, and it could easily lead to incorrect query results if there are > changes to the surrounding context, so it should be fixed regardless. > This is a long-standing bug that has existed at least since 2016, as long as > the ListQuery class has existed. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org