[ 
https://issues.apache.org/jira/browse/SPARK-48503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48503.
---------------------------------
    Fix Version/s: 4.0.0
       Resolution: Fixed

Issue resolved by pull request 46839
[https://github.com/apache/spark/pull/46839]

> Scalar subquery with group-by and non-equality predicate incorrectly allowed, 
> wrong results
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-48503
>                 URL: https://issues.apache.org/jira/browse/SPARK-48503
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Jack Chen
>            Assignee: Jack Chen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>
> This query is not legal and should give an error, but instead we incorrectly 
> allow it and it returns wrong results.
> {code:java}
> create table x(x1 int, x2 int);
> insert into x values (1, 1);
> create table y(y1 int, y2 int);
> insert into y values (2, 2), (3, 3);
> select *, (select count(*) from y where y1 > x1 group by y1) from x; {code}
> It returns two rows, even though there's only one row of x.
> The correct result is an error: more than one row returned by a subquery used 
> as an expression (as seen in postgres for example)
>  
> This is a longstanding bug. The bug is in CheckAnalysis in 
> {{{}checkAggregateInScalarSubquery{}}}. It allows grouping columns that are 
> present in correlation predicates, but doesn’t check whether those predicates 
> are equalities -  because when that code was written, non-equality 
> correlation wasn’t allowed. Therefore, it looks like this bug has existed 
> since non-equality correlation was added (~2 years ago).
>  
> Various other expressions that are not equi-joins between the inner and outer 
> fields hit this too, e.g. `where y1 + y2 = x1 group by y1`.
> Another bugged case is if the correlation condition is an equality but it's 
> under another operator like an OUTER JOIN or UNION.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to