KaiXinXIaoLei created SPARK-23542: ------------------------------------- Summary: The `where exists' action in optimized logical plan should be optimized Key: SPARK-23542 URL: https://issues.apache.org/jira/browse/SPARK-23542 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.0 Reporter: KaiXinXIaoLei
The optimized logical plan of query 'select * from tt1 where exists (select * from tt2 where tt1.i = tt2.i);` is : >== Optimized Logical Plan == >Join LeftSemi, (i#143 = i#145) >:- MetastoreRelation default, tt1 >+- MetastoreRelation default, tt2 But the query of `select * from tt1 left semi join tt2 on tt2.i = tt1.i` is : >== Optimized Logical Plan == Join LeftSemi, (i#152 = i#150) :- Filter isnotnull(i#150) : +- MetastoreRelation default, tt1 +- Project [i#152|#152] +- MetastoreRelation default, tt2 So i think the optimized logical plan of 'select * from tt1 where exists (select * from tt2 where tt1.i = tt2.i);` should be further optimization. == Optimized Logical Plan == Join LeftSemi, (i#143 = i#145) :- MetastoreRelation default, tt1 +- MetastoreRelation default, tt2 -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org