[ https://issues.apache.org/jira/browse/SPARK-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yadong Qi closed SPARK-12167. ----------------------------- Resolution: Duplicate > Invoke the right sameResult function when plan is warpped with SubQueries > ------------------------------------------------------------------------- > > Key: SPARK-12167 > URL: https://issues.apache.org/jira/browse/SPARK-12167 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.2 > Reporter: Yadong Qi > > I find this bug when I use cache table, > ``` > spark-sql> create table src_p(key int, value int) stored as parquet; > OK > Time taken: 3.144 seconds > spark-sql> cache table src_p; > Time taken: 1.452 seconds > spark-sql> explain extended select count(*) from src_p; > ``` > I got the wrong physical plan > ``` > == Physical Plan == > TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], > output=[_c0#28L]) > TungstenExchange SinglePartition > TungstenAggregate(key=[], > functions=[(count(1),mode=Partial,isDistinct=false)], > output=[currentCount#33L]) > Scan ParquetRelation[hdfs://9.91.8.131:9000/user/hive/warehouse/src_p][] > ``` > and the right physical plan is > ``` > == Physical Plan == > TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], > output=[_c0#47L]) > TungstenExchange SinglePartition > TungstenAggregate(key=[], > functions=[(count(1),mode=Partial,isDistinct=false)], > output=[currentCount#62L]) > InMemoryColumnarTableScan (InMemoryRelation [key#45,value#46], true, > 10000, StorageLevel(true, true, false, true, 1), (Scan > ParquetRelation[hdfs://9.91.8.131:9000/user/hive/warehouse/src_p][key#9,value#10]), > Some(src_p)) > ``` > When the implementation classes of `MultiInstanceRelation`(eg. > `LogicalRelation`, `LocalRelation`) are warpped with SubQueries, they can't > invoke the right `sameResult` function in their own implementation. So we > need to eliminate SubQueries first and then try to invoke `sameResult` > function in their own implementation. > Like: > When plan is > `Subquery(LogicalRelation(relation:ParquetRelation[hdfs://9.91.8.131:9000/user/hive/warehouse/src_p], > expectedOutputAttributes:Some(ArrayBuffer(key#0, value#1))))`, first > eliminate SubQueries, and then will invoke the `sameResult` function in > `LogicalRelation` instead of `LogicalPlan`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org