Hi, Maybe you can open a JIRA and upload your plan as Michael suggested. This is an interesting feature. Thanks!
Xiao Li 2016-03-21 10:36 GMT-07:00 Michael Armbrust <mich...@databricks.com>: > It's helpful if you can include the output of EXPLAIN EXTENDED or > df.explain(true) whenever asking about query performance. > > On Mon, Mar 21, 2016 at 6:27 AM, gtinside <gtins...@gmail.com> wrote: > >> Hi , >> >> I am trying to execute a simple query with join on 3 tables. When I look >> at >> the execution plan , it varies with position of table in the "from" >> clause. >> Execution plan looks more optimized when the position of table with >> predicates is specified before any other table. >> >> >> Original query : >> >> select distinct pge.portfolio_code >> from table1 pge join table2 p >> on p.perm_group = pge.anc_port_group >> join table3 uge >> on p.user_group=uge.anc_user_group >> where uge.user_name = 'user' and p.perm_type = 'TEST' >> >> Optimized query (table with predicates is moved ahead): >> >> select distinct pge.portfolio_code >> from table1 uge, table2 p, table3 pge >> where uge.user_name = 'user' and p.perm_type = 'TEST' >> and p.perm_group = pge.anc_port_group >> and p.user_group=uge.anc_user_group >> >> >> Execution plan is more optimized for the optimized query and hence the >> query >> executes faster. All the tables are being sourced from parquet files >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Optimization-tp26548.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >