And an update is: this ONLY happen in Spark 1.5, I try to run it under Spark 1.4 and 1.4.1, there are no issue (the program is developed under Spark 1.4 last time, and I just re-test it, it works). So this is proven that there is no issue on the logic and data, it is caused by the new version of Spark.
So I want to know any new setup I should set in Spark 1.5 to make it work? Regards, Shuai From: Shuai Zheng [mailto:szheng.c...@gmail.com] Sent: Wednesday, November 04, 2015 3:22 PM To: user@spark.apache.org Subject: [Spark 1.5]: Exception in thread "broadcast-hash-join-2" java.lang.OutOfMemoryError: Java heap space Hi All, I have a program which actually run a bit complex business (join) in spark. And I have below exception: I running on Spark 1.5, and with parameter: spark-submit --deploy-mode client --executor-cores=24 --driver-memory=2G --executor-memory=45G -class . Some other setup: sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer").set("spark.kryoserializer.buff er.max", "2047m"); sparkConf.set("spark.executor.extraJavaOptions", "-XX:+PrintGCDetails -XX:+PrintGCTimeStamps").set("spark.sql.autoBroadcastJoinThreshold", "104857600"); This is running on AWS c3*8xlarge instance. I am not sure what kind of parameter I should set if I have below OutOfMemoryError exception. # # java.lang.OutOfMemoryError: Java heap space # -XX:OnOutOfMemoryError="kill -9 %p" # Executing /bin/sh -c "kill -9 10181"... Exception in thread "broadcast-hash-join-2" java.lang.OutOfMemoryError: Java heap space at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProje ction.apply(Unknown Source) at org.apache.spark.sql.execution.joins.UnsafeHashedRelation$.apply(HashedRelat ion.scala:380) at org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.sc ala:123) at org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadca stFuture$1$$anonfun$apply$1.apply(BroadcastHashOuterJoin.scala:95) at org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadca stFuture$1$$anonfun$apply$1.apply(BroadcastHashOuterJoin.scala:85) at org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.sc ala:100) at org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadca stFuture$1.apply(BroadcastHashOuterJoin.scala:85) at org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadca stFuture$1.apply(BroadcastHashOuterJoin.scala:85) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future. scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11 45) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6 15) at java.lang.Thread.run(Thread.java:745) Any hint will be very helpful. Regards, Shuai