I noticed that my spark application is broadcasting even though I
set spark.sql.autoBroadcastJoinThreshold = -1. When I checked the query
plan, I noticed that the physical plan was an AdaptiveSparkPlan. When I
checked the adaptive settings, I noticed that there was a separate setting
Maybe you could try “--conf spark.sql.statistics.fallBackToHdfs=true"
On 2019/05/11 01:54:27, V0lleyBallJunki3 wrote:
> Hello,>
> I have set spark.sql.autoBroadcastJoinThreshold=1GB and I am running the>
> spark job. However, my ap
Hello,
I have set spark.sql.autoBroadcastJoinThreshold=1GB and I am running the
spark job. However, my application is failing with:
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62
So what I discovered was that if I write the table being joined to the disk
and then read it again Spark correctly broadcasts it. I think it is because
when Spark estimates the size of smaller table it estimates it incorrectly
to be much bigger that what it is and hence decides to do a
I have a small table well below 50 MB that I want to broadcast join with a
larger table. However, if I set spark.sql.autoBroadcastJoinThreshold to 100
MB spark still decides to do a SortMergeJoin instead of a broadcast join. I
have to set an explicit broadcast hint on the table for it to do
in the Spark 2.x. Can you try it on Spark 2.0?
>
> Yong
>
> --
> *From:* Jone Zhang <joyoungzh...@gmail.com>
> *Sent:* Wednesday, May 10, 2017 7:10 AM
> *To:* user @spark/'user @spark'/spark users/user@spark
> *Subject:* Why spark.sql.autoBroadcastJ
broadcast join. This is fixed in the Spark 2.x. Can you try it on Spark 2.0?
Yong
From: Jone Zhang <joyoungzh...@gmail.com>
Sent: Wednesday, May 10, 2017 7:10 AM
To: user @spark/'user @spark'/spark users/user@spark
Subject: Why spark.sql.autoBroadcastJoinT
Now i use spark1.6.0 in java
I wish the following sql to be executed in BroadcastJoin way
*select * from sample join feature*
This is my step
1.set spark.sql.autoBroadcastJoinThreshold=100M
2.HiveContext.sql("cache lazy table feature as "select * from src where
...) which result size is
Does this work with spark-sql in 1.0.1 too ? I tried like this
sqlContext.sql(SET spark.sql.autoBroadcastJoinThreshold=1;)
But it still seems to trigger shuffleMapTask() and amount of shuffle is
same with / without this parameter...
Kindly request some help here
Thanks
--
View