It is hard to guess why OOM happens without knowing your application's logic and the data size. Without knowing that, I can only guess based on some common experiences: 1) increase "spark.default.parallelism"2) Increase your executor-memory, maybe 6g is not just enough 3) Your environment is kind of unbalance between cup cores and available memory (8 cores vs 12G). Each core should have 3G for Spark.4) If you cache RDD, using "MEMORY_ONLY_SER" instead of "MEMORY_ONLY"5) Since your cores is much more compared with your available memory, lower the cores for executor by set "-Dspark.deploy.defaultCores=". When you have not enough memory, reduce the concurrency of your executor, it will lower the memory requirement, with running in a slower speed. Yong
Date: Wed, 8 Apr 2015 04:57:22 +0800 Subject: Re: 'Java heap space' error occured when query 4G data file from HDFS From: lidali...@gmail.com To: user@spark.apache.org Any help?please. Help me do a right configure. 李铖 <lidali...@gmail.com>于2015年4月7日星期二写道: In my dev-test env .I have 3 virtual machines ,every machine have 12G memory,8 cpu core. Here is spark-defaults.conf,and spark-env.sh.Maybe some config is not right. I run this command :spark-submit --master yarn-client --driver-memory 7g --executor-memory 6g /home/hadoop/spark/main.pyexception rised. spark-defaults.conf spark.master spark://cloud1:7077spark.default.parallelism 100spark.eventLog.enabled truespark.serializer org.apache.spark.serializer.KryoSerializerspark.driver.memory 5gspark.driver.maxResultSize 6gspark.kryoserializer.buffer.mb 256spark.kryoserializer.buffer.max.mb 512 spark.executor.memory 4gspark.rdd.compress truespark.storage.memoryFraction 0spark.akka.frameSize 50spark.shuffle.compress truespark.shuffle.spill.compress falsespark.local.dir /home/hadoop/tmp spark-evn.sh export SCALA=/home/hadoop/softsetup/scalaexport JAVA_HOME=/home/hadoop/softsetup/jdk1.7.0_71export SPARK_WORKER_CORES=1export SPARK_WORKER_MEMORY=4gexport HADOOP_CONF_DIR=/opt/cloud/hadoop/etc/hadoopexport SPARK_EXECUTOR_MEMORY=4gexport SPARK_DRIVER_MEMORY=4g Exception: 15/04/07 18:11:03 INFO BlockManagerInfo: Added taskresult_28 on disk on cloud3:38109 (size: 162.7 MB)15/04/07 18:11:03 INFO BlockManagerInfo: Added taskresult_28 on disk on cloud3:38109 (size: 162.7 MB)15/04/07 18:11:03 INFO TaskSetManager: Starting task 31.0 in stage 1.0 (TID 31, cloud3, NODE_LOCAL, 1296 bytes)15/04/07 18:11:03 INFO BlockManagerInfo: Added taskresult_29 on disk on cloud2:49451 (size: 163.7 MB)15/04/07 18:11:03 INFO BlockManagerInfo: Added taskresult_29 on disk on cloud2:49451 (size: 163.7 MB)15/04/07 18:11:03 INFO TaskSetManager: Starting task 30.0 in stage 1.0 (TID 32, cloud2, NODE_LOCAL, 1296 bytes)15/04/07 18:11:03 ERROR Utils: Uncaught exception in thread task-result-getter-0java.lang.OutOfMemoryError: Java heap space at org.apache.spark.scheduler.DirectTaskResult$$anonfun$readExternal$1.apply$mcV$sp(TaskResult.scala:61) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985) at org.apache.spark.scheduler.DirectTaskResult.readExternal(TaskResult.scala:58) at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:81) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:73) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:49) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:49) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1460) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:48) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)Exception in thread "task-result-getter-0" java.lang.OutOfMemoryError: Java heap space at org.apache.spark.scheduler.DirectTaskResult$$anonfun$readExternal$1.apply$mcV$sp(TaskResult.scala:61) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985) at org.apache.spark.scheduler.DirectTaskResult.readExternal(TaskResult.scala:58) at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:81) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:73) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:49) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:49) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1460) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:48) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)15/04/07 18:11:03 INFO BlockManagerInfo: Added taskresult_28 on disk on cloud3:38109 (size: 162.7 MB)15/04/07 18:11:03 INFO BlockManagerInfo: Added taskresult_29 on disk on cloud2:49451 (size: 163.7 MB)15/04/07 18:11:05 ERROR Utils: Uncaught exception in thread task-result-getter-4java.lang.OutOfMemoryError: Java heap spaceException in thread "task-result-getter-4" java.lang.OutOfMemoryError: Java heap space15/04/07 18:11:07 INFO BlockManagerInfo: Added taskresult_31 on disk on cloud3:38109 (size: 87.9 MB)15/04/07 18:11:07 INFO BlockManagerInfo: Added taskresult_31 on disk on cloud3:38109 (size: 87.9 MB)15/04/07 18:11:08 WARN TransportChannelHandler: Exception in connection from cloud3/192.168.0.95:38109java.lang.OutOfMemoryError: Java heap space15/04/07 18:11:08 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from cloud3/192.168.0.95:38109 is closed15/04/07 18:11:08 ERROR OneForOneBlockFetcher: Failed while starting block fetchesjava.lang.OutOfMemoryError: Java heap space15/04/07 18:11:08 ERROR RetryingBlockFetcher: Failed to fetch block taskresult_31, and will not retry (0 retries)java.lang.OutOfMemoryError: Java heap space15/04/07 18:11:08 ERROR TransportClient: Failed to send RPC 7722440433247749491 to cloud3/192.168.0.95:38109: java.nio.channels.ClosedChannelExceptionjava.nio.channels.ClosedChannelException15/04/07 18:11:08 ERROR OneForOneBlockFetcher: Failed while starting block fetchesjava.io.IOException: Failed to send RPC 7722440433247749491 to cloud3/192.168.0.95:38109: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient$2.operationComplete(TransportClient.java:158) at org.apache.spark.network.client.TransportClient$2.operationComplete(TransportClient.java:145) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567) at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:745) at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:646) at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1054) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:658) at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:716) at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:651) at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:112) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:658) at io.netty.channel.AbstractChannelHandlerContext.access$2000(AbstractChannelHandlerContext.java:32) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:939) at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:991) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:924) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at java.lang.Thread.run(Thread.java:745)Caused by: java.nio.channels.ClosedChannelException15/04/07 18:11:08 INFO BlockManagerInfo: Added taskresult_30 on disk on cloud1:44029 (size: 163.5 MB)15/04/07 18:11:08 INFO BlockManagerInfo: Added taskresult_30 on disk on cloud1:44029 (size: 163.5 MB)15/04/07 18:11:08 ERROR Utils: Uncaught exception in thread task-result-getter-6java.lang.OutOfMemoryError: Java heap spaceException in thread "task-result-getter-6" java.lang.OutOfMemoryError: Java heap space15/04/07 18:11:08 ERROR TaskResultGetter: Exception while getting task resultjava.util.concurrent.ExecutionException: Boxed Error at scala.concurrent.impl.Promise$.resolver(Promise.scala:55) at scala.concurrent.impl.Promise$.scala$concurrent$impl$Promise$$resolveTry(Promise.scala:47) at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:244) at scala.concurrent.Promise$class.complete(Promise.scala:55) at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153) at scala.concurrent.Promise$class.failure(Promise.scala:107) at scala.concurrent.impl.Promise$DefaultPromise.failure(Promise.scala:153) at org.apache.spark.network.BlockTransferService$$anon$1.onBlockFetchFailure(BlockTransferService.scala:92) at org.apache.spark.network.shuffle.RetryingBlockFetcher$RetryingBlockFetchListener.onBlockFetchFailure(RetryingBlockFetcher.java:230) at org.apache.spark.network.shuffle.OneForOneBlockFetcher.failRemainingBlocks(OneForOneBlockFetcher.java:123) at org.apache.spark.network.shuffle.OneForOneBlockFetcher.access$300(OneForOneBlockFetcher.java:43) at org.apache.spark.network.shuffle.OneForOneBlockFetcher$1.onFailure(OneForOneBlockFetcher.java:114) at org.apache.spark.network.client.TransportResponseHandler.failOutstandingRequests(TransportResponseHandler.java:84) at org.apache.spark.network.client.TransportResponseHandler.exceptionCaught(TransportResponseHandler.java:108) at org.apache.spark.network.server.TransportChannelHandler.exceptionCaught(TransportChannelHandler.java:69) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:271) at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:768) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:335) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at java.lang.Thread.run(Thread.java:745)Caused by: java.lang.OutOfMemoryError: Java heap space