Then you need to tell us the spark version, and post the execution plan here, so we can help you better.
Yong ________________________________ From: Paley Louie <paley2...@gmail.com> Sent: Sunday, July 2, 2017 12:36 AM To: Yong Zhang Cc: Bryan Jeffrey; d...@spark.org; user@spark.apache.org Subject: Re: about broadcast join of base table in spark sql Thank you for your reply, I have tried to add broadcast hint to the base table, but it just cannot be broadcast out. On Jun 30, 2017, at 9:13 PM, Yong Zhang <java8...@hotmail.com<mailto:java8...@hotmail.com>> wrote: Or since you already use the DataFrame API, instead of SQL, you can add the broadcast function to force it. https://spark.apache.org/docs/1.6.2/api/java/org/apache/spark/sql/functions.html#broadcast(org.apache.spark.sql.DataFrame) Yong functions - Apache Spark<https://spark.apache.org/docs/1.6.2/api/java/org/apache/spark/sql/functions.html#broadcast(org.apache.spark.sql.DataFrame)> spark.apache.org<http://spark.apache.org/> Computes the numeric value of the first character of the string column, and returns the result as a int column. ________________________________ From: Bryan Jeffrey <bryan.jeff...@gmail.com<mailto:bryan.jeff...@gmail.com>> Sent: Friday, June 30, 2017 6:57 AM To: d...@spark.org<mailto:d...@spark.org>; user@spark.apache.org<mailto:user@spark.apache.org>; paleyl Subject: Re: about broadcast join of base table in spark sql Hello. If you want to allow broadcast join with larger broadcasts you can set spark.sql.autoBroadcastJoinThreshold to a higher value. This will cause the plan to allow join despite 'A' being larger than the default threshold. Get Outlook for Android<https://aka.ms/ghei36> From: paleyl Sent: Wednesday, June 28, 10:42 PM Subject: about broadcast join of base table in spark sql To: d...@spark.org<mailto:d...@spark.org>, user@spark.apache.org<mailto:user@spark.apache.org> Hi All, Recently I meet a problem in broadcast join: I want to left join table A and B, A is the smaller one and the left table, so I wrote A = A.join(B,A("key1") === B("key2"),"left") but I found that A is not broadcast out, as the shuffle size is still very large. I guess this is a designed mechanism in spark, so could anyone please tell me why it is designed like this? I am just very curious. Best, Paley