Does adaptive auto broadcast respect spark.sql.autoBroadcastJoinThreshold

2022-06-02 Thread Henry Quan
I noticed that my spark application is broadcasting even though I set spark.sql.autoBroadcastJoinThreshold = -1. When I checked the query plan, I noticed that the physical plan was an AdaptiveSparkPlan. When I checked the adaptive settings, I noticed that there was a separate setting

Re: spark.sql.autoBroadcastJoinThreshold not taking effect

2019-05-13 Thread Lantao Jin
Maybe you could try “--conf spark.sql.statistics.fallBackToHdfs=true" On 2019/05/11 01:54:27, V0lleyBallJunki3 wrote: > Hello,> > I have set spark.sql.autoBroadcastJoinThreshold=1GB and I am running the> > spark job. However, my ap

spark.sql.autoBroadcastJoinThreshold not taking effect

2019-05-10 Thread V0lleyBallJunki3
Hello, I have set spark.sql.autoBroadcastJoinThreshold=1GB and I am running the spark job. However, my application is failing with: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62

Re: Spark not doing a broadcast join inspite of the table being well below spark.sql.autoBroadcastJoinThreshold

2019-05-10 Thread V0lleyBallJunki3
So what I discovered was that if I write the table being joined to the disk and then read it again Spark correctly broadcasts it. I think it is because when Spark estimates the size of smaller table it estimates it incorrectly to be much bigger that what it is and hence decides to do a

Spark not doing a broadcast join inspite of the table being well below spark.sql.autoBroadcastJoinThreshold

2019-05-09 Thread V0lleyBallJunki3
I have a small table well below 50 MB that I want to broadcast join with a larger table. However, if I set spark.sql.autoBroadcastJoinThreshold to 100 MB spark still decides to do a SortMergeJoin instead of a broadcast join. I have to set an explicit broadcast hint on the table for it to do

Re: Why spark.sql.autoBroadcastJoinThreshold not available

2017-05-15 Thread Jone Zhang
in the Spark 2.x. Can you try it on Spark 2.0? > > Yong > > -- > *From:* Jone Zhang <joyoungzh...@gmail.com> > *Sent:* Wednesday, May 10, 2017 7:10 AM > *To:* user @spark/'user @spark'/spark users/user@spark > *Subject:* Why spark.sql.autoBroadcastJ

Re: Why spark.sql.autoBroadcastJoinThreshold not available

2017-05-15 Thread Yong Zhang
broadcast join. This is fixed in the Spark 2.x. Can you try it on Spark 2.0? Yong From: Jone Zhang <joyoungzh...@gmail.com> Sent: Wednesday, May 10, 2017 7:10 AM To: user @spark/'user @spark'/spark users/user@spark Subject: Why spark.sql.autoBroadcastJoinT

Why spark.sql.autoBroadcastJoinThreshold not available

2017-05-10 Thread Jone Zhang
Now i use spark1.6.0 in java I wish the following sql to be executed in BroadcastJoin way *select * from sample join feature* This is my step 1.set spark.sql.autoBroadcastJoinThreshold=100M 2.HiveContext.sql("cache lazy table feature as "select * from src where ...) which result size is

spark.sql.autoBroadcastJoinThreshold

2014-09-24 Thread sridhar1135
Does this work with spark-sql in 1.0.1 too ? I tried like this sqlContext.sql(SET spark.sql.autoBroadcastJoinThreshold=1;) But it still seems to trigger shuffleMapTask() and amount of shuffle is same with / without this parameter... Kindly request some help here Thanks -- View