Re: Is it possible to turn a SortMergeJoin into BroadcastHashJoin?

Takeshi Yamamuro Mon, 20 Jun 2016 04:16:53 -0700

Hi,

How about caching the result of `select * from a where a.c2 < 1000`, then
joining them?
You probably need to tune `spark.sql.autoBroadcastJoinThreshold` to enable
broadcast joins for the result table.


// maropu


On Mon, Jun 20, 2016 at 8:06 PM, 梅西0247 <zhen...@dtdream.com> wrote:

> Hi everyone,
>
> I ran a SQL join statement on Spark 1.6.1 like this:
> select * from table1 a join table2 b on a.c1 = b.c1 where a.c2 < 1000;
> and it took quite a long time because It is a SortMergeJoin and the two
> tables are big.
>
>
> In fact,  the size of filter result(select * from a where a.c2 < 1000) is
> very small, and I think a better solution is to use a BroadcastJoin with
> the filter result, but  I know  the physical plan is static and it won't be
> changed.
>
> So, can we make the physical plan more adaptive? (In this example, I mean
> using a  BroadcastHashJoin instead of SortMergeJoin automatically. )
>
>
>
>
>
>


-- 
---
Takeshi Yamamuro

Re: Is it possible to turn a SortMergeJoin into BroadcastHashJoin?

Reply via email to