Andrey Gubichev created SPARK-43778:
---------------------------------------

             Summary: RewriteCorrelatedScalarSubquery should handle duplicate 
attributes
                 Key: SPARK-43778
                 URL: https://issues.apache.org/jira/browse/SPARK-43778
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.4.0
            Reporter: Andrey Gubichev


This is a correctness problem caused by the fact that the decorrelation rule 
does not dedup join attributes properly. This leads to the join on (c1 = c1), 
which is simplified to True and the join becomes a cross product.

 

Example query:

create view t(c1, c2) as values (0, 1), (0, 2), (1, 2);

select c1, c2, (select count(*) cnt from t t2 where t1.c1 = t2.c1 having cnt = 
0) from t t1
-- Correct answer: [(0, 1, null), (0, 2, null), (1, 2, null)]
+---+---+------------------+
|c1 |c2 |scalarsubquery(c1)|
+---+---+------------------+
|0  |1  |null              |
|0  |1  |null              |
|0  |2  |null              |
|0  |2  |null              |
|1  |2  |null              |
|1  |2  |null              |
+---+---+------------------+

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to