[ 
https://issues.apache.org/jira/browse/SPARK-43413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-43413.
---------------------------------
    Fix Version/s: 3.5.0
       Resolution: Fixed

Issue resolved by pull request 41094
[https://github.com/apache/spark/pull/41094]

> IN subquery ListQuery has wrong nullability
> -------------------------------------------
>
>                 Key: SPARK-43413
>                 URL: https://issues.apache.org/jira/browse/SPARK-43413
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: Jack Chen
>            Assignee: Jack Chen
>            Priority: Major
>             Fix For: 3.5.0
>
>
> IN subquery expressions are incorrectly marked as non-nullable, even when 
> they are actually nullable. They correctly check the nullability of the 
> left-hand-side, but the right-hand-side of a IN subquery, the ListQuery, is 
> currently defined with nullability = false always. This is incorrect and can 
> lead to incorrect query transformations.
> Example: (non_nullable_col IN (select nullable_col)) <=> TRUE . Here the IN 
> expression returns NULL when the nullable_col is null, but our code marks it 
> as non-nullable, and therefore SimplifyBinaryComparison transforms away the 
> <=> TRUE, transforming the expression to non_nullable_col IN (select 
> nullable_col) , which is an incorrect transformation because NULL values of 
> nullable_col now cause the expression to yield NULL instead of FALSE.
> This bug can potentially lead to wrong results, but in most cases this 
> doesn't directly cause wrong results end-to-end, because IN subqueries are 
> almost always transformed to semi/anti/existence joins in 
> RewritePredicateSubquery, and this rewrite can also incorrectly discard 
> NULLs, which is another bug. But we can observe it causing wrong behavior in 
> unit tests, and it could easily lead to incorrect query results if there are 
> changes to the surrounding context, so it should be fixed regardless.
> This is a long-standing bug that has existed at least since 2016, as long as 
> the ListQuery class has existed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to