Allison Wang created SPARK-36656: ------------------------------------ Summary: CollapseProject should not collapse correlated scalar subqueries Key: SPARK-36656 URL: https://issues.apache.org/jira/browse/SPARK-36656 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: Allison Wang
Currently, the optimizer rule `CollapseProject` inlines expressions generated from correlated scalar subqueries, which can create unnecessary left outer joins. {code:scala} // Before Project [c1, s, (s * 10)] +- Project [c1, scalar-subquery [c1] AS s] : +- Aggregate [c1], [first(c2), c1] : +- LocalRelation [c1, c2] +- LocalRelation [c1, c2] // After (scalar subqueries are inlined) Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)] : +- Aggregate [c1], [first(c2), c1] : +- LocalRelation [c1, c2] : +- Aggregate [c1], [first(c2), c1] : +- LocalRelation [c1, c2] +- LocalRelation [c1, c2] {code} Then this query will have two LeftOuter joins created. We should only collapse projects after correlated subqueries are rewritten as joins. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org