On 16 December 2015 at 06:19, Deenar Toraskar <
deenar.toras...@thinkreactive.co.uk> wrote:

> Hi
>
> I had the same problem. There is a query with a lot of small tables (5x)
> all below the broadcast threshold and Spark is broadcasting all these
> tables together without checking if there is sufficient memory available.
>
> I got around this issue by reducing the
> *spark.sql.autoBroadcastJoinThreshold* to stop broadcasting the bigger
> tables in the query.
>
> This looks like a issue to me. A fix would be to
> a) ensure that in addition to the per table threshold, there is a total
> broadcast size per query, so only data upto that limit is broadcast
> preventing executors running out of memory.
>
> Shall I raise a JIRA for this?
>
> Regards
> Deenar
>
>
> On 4 November 2015 at 22:55, Shuai Zheng <szheng.c...@gmail.com> wrote:
>
>> And an update is: this ONLY happen in Spark 1.5, I try to run it under
>> Spark 1.4 and 1.4.1, there are no issue (the program is developed under
>> Spark 1.4 last time, and I just re-test it, it works). So this is proven
>> that there is no issue on the logic and data, it is caused by the new
>> version of Spark.
>>
>>
>>
>> So I want to know any new setup I should set in Spark 1.5 to make it
>> work?
>>
>>
>>
>> Regards,
>>
>>
>>
>> Shuai
>>
>>
>>
>> *From:* Shuai Zheng [mailto:szheng.c...@gmail.com]
>> *Sent:* Wednesday, November 04, 2015 3:22 PM
>> *To:* user@spark.apache.org
>> *Subject:* [Spark 1.5]: Exception in thread "broadcast-hash-join-2"
>> java.lang.OutOfMemoryError: Java heap space
>>
>>
>>
>> Hi All,
>>
>>
>>
>> I have a program which actually run a bit complex business (join) in
>> spark. And I have below exception:
>>
>>
>>
>> I running on Spark 1.5, and with parameter:
>>
>>
>>
>> spark-submit --deploy-mode client --executor-cores=24 --driver-memory=2G
>> --executor-memory=45G –class …
>>
>>
>>
>> Some other setup:
>>
>>
>>
>> sparkConf.set("spark.serializer",
>> "org.apache.spark.serializer.KryoSerializer").set("spark.kryoserializer.buffer.max",
>> "2047m");
>>
>> sparkConf.set("spark.executor.extraJavaOptions", "-XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps").set("spark.sql.autoBroadcastJoinThreshold",
>> "104857600");
>>
>>
>>
>> This is running on AWS c3*8xlarge instance. I am not sure what kind of
>> parameter I should set if I have below OutOfMemoryError exception.
>>
>>
>>
>> #
>>
>> # java.lang.OutOfMemoryError: Java heap space
>>
>> # -XX:OnOutOfMemoryError="kill -9 %p"
>>
>> #   Executing /bin/sh -c "kill -9 10181"...
>>
>> Exception in thread "broadcast-hash-join-2" java.lang.OutOfMemoryError:
>> Java heap space
>>
>>         at
>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>> Source)
>>
>>         at
>> org.apache.spark.sql.execution.joins.UnsafeHashedRelation$.apply(HashedRelation.scala:380)
>>
>>         at
>> org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:123)
>>
>>         at
>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashOuterJoin.scala:95)
>>
>>         at
>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashOuterJoin.scala:85)
>>
>>         at
>> org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:100)
>>
>>         at
>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashOuterJoin.scala:85)
>>
>>         at
>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashOuterJoin.scala:85)
>>
>>         at
>> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>>
>>         at
>> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>>
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>
>>         at java.lang.Thread.run(Thread.java:745)
>>
>>
>>
>> Any hint will be very helpful.
>>
>>
>>
>> Regards,
>>
>>
>>
>> Shuai
>>
>
>

Reply via email to