ton Okolnychyi
Sent: Saturday, November 26, 2016 4:05 PM
To: Swapnil Shinde
Cc: Benyi Wang; user@spark.apache.org
Subject: Re: Dataframe broadcast join hint not working
Hi guys,
I also experienced a situation when Spark 1.6.2 ignored my hint to do a
broadcast join (i.e. broadcast(df)) with a small da
Hi guys,
I also experienced a situation when Spark 1.6.2 ignored my hint to do a
broadcast join (i.e. broadcast(df)) with a small dataset. However, this
happened only in 1 of 3 cases. Setting the
"spark.sql.autoBroadcastJoinThreshold" property did not have any impact as
well. All 3 cases work fine
I think your dataframes are converted from RDDs, Are those RDDs computed or
read from files directly? I guess it might affect how spark compute the
execution plan.
Try this: save your data frame which will be broadcasted to HDFS, and read
it back into a dataframe. Then do the join and check the ex
I am using Spark 1.6.3 and below is the real plan (a,b,c in above were just
for illustration purpose)
== Physical Plan ==
Project [ltt#3800 AS ltt#3814,CASE WHEN isnull(etv_demo_id#3813) THEN
mr_demo_id#3801 ELSE etv_demo_id#3813 AS etv_demo_id#3815]
+- SortMergeOuterJoin [mr_demoname#3802,mr_demo
Could you post the result of explain `c.explain`? If it is broadcast join,
you will see it in explain.
On Sat, Nov 26, 2016 at 10:51 AM, Swapnil Shinde
wrote:
> Hello
> I am trying a broadcast join on dataframes but it is still doing
> SortMergeJoin. I even try setting spark.sql.autoBroadcas
Hi,
Which version of spark you are using.
Less than 10Mb automatically converted as broadcast join in spark.
\Thanks,
selvam R
On Sat, Nov 26, 2016 at 6:51 PM, Swapnil Shinde
wrote:
> Hello
> I am trying a broadcast join on dataframes but it is still doing
> SortMergeJoin. I even try sett
Hello
I am trying a broadcast join on dataframes but it is still doing
SortMergeJoin. I even try setting spark.sql.autoBroadcastJoinThreshold
higher but still no luck.
Related piece of code-
val c = a.join(braodcast(b), "id")
On a side note, if I do SizeEstimator.estimate(b) and it