Re: Spark SQL Optimization

Xiao Li Mon, 21 Mar 2016 10:42:19 -0700

Hi, Maybe you can open a JIRA and upload your plan as Michael suggested.
This is an interesting feature. Thanks!


Xiao Li

2016-03-21 10:36 GMT-07:00 Michael Armbrust <mich...@databricks.com>:

> It's helpful if you can include the output of EXPLAIN EXTENDED or
> df.explain(true) whenever asking about query performance.
>
> On Mon, Mar 21, 2016 at 6:27 AM, gtinside <gtins...@gmail.com> wrote:
>
>> Hi ,
>>
>> I am trying to execute a simple query with join on 3 tables. When I look
>> at
>> the execution plan , it varies with position of table in the "from"
>> clause.
>> Execution plan looks more optimized when the position of table with
>> predicates is specified before any other table.
>>
>>
>> Original query :
>>
>> select distinct pge.portfolio_code
>> from table1 pge join table2 p
>> on p.perm_group = pge.anc_port_group
>> join table3 uge
>> on p.user_group=uge.anc_user_group
>> where uge.user_name = 'user' and p.perm_type = 'TEST'
>>
>> Optimized query (table with predicates is moved ahead):
>>
>> select distinct pge.portfolio_code
>> from table1 uge, table2 p, table3 pge
>> where uge.user_name = 'user' and p.perm_type = 'TEST'
>> and p.perm_group = pge.anc_port_group
>> and p.user_group=uge.anc_user_group
>>
>>
>> Execution plan is more optimized for the optimized query and hence the
>> query
>> executes faster. All the tables are being sourced from parquet files
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Optimization-tp26548.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Re: Spark SQL Optimization

Reply via email to