Hi, How about caching the result of `select * from a where a.c2 < 1000`, then joining them? You probably need to tune `spark.sql.autoBroadcastJoinThreshold` to enable broadcast joins for the result table.
// maropu On Mon, Jun 20, 2016 at 8:06 PM, 梅西0247 <zhen...@dtdream.com> wrote: > Hi everyone, > > I ran a SQL join statement on Spark 1.6.1 like this: > select * from table1 a join table2 b on a.c1 = b.c1 where a.c2 < 1000; > and it took quite a long time because It is a SortMergeJoin and the two > tables are big. > > > In fact, the size of filter result(select * from a where a.c2 < 1000) is > very small, and I think a better solution is to use a BroadcastJoin with > the filter result, but I know the physical plan is static and it won't be > changed. > > So, can we make the physical plan more adaptive? (In this example, I mean > using a BroadcastHashJoin instead of SortMergeJoin automatically. ) > > > > > > -- --- Takeshi Yamamuro