Also check out this post http://apache-spark-user-list.1001560.n3.nabble.com/Spark-program-thows-OutOfMemoryError-td4268.html
On Mon, Apr 21, 2014 at 11:49 AM, Akhil Das <ak...@mobipulse.in> wrote: > Hi Chieh, > > You can increase the heap size by exporting the java options (See below, > will increase the heap size to 10Gb) > > export _JAVA_OPTIONS="-Xmx10g" > > > > > On Mon, Apr 21, 2014 at 11:43 AM, Chieh-Yen <r01944...@csie.ntu.edu.tw>wrote: > >> Can anybody help me? >> Thanks. >> >> Chieh-Yen >> >> >> On Wed, Apr 16, 2014 at 5:18 PM, Chieh-Yen <r01944...@csie.ntu.edu.tw>wrote: >> >>> Dear all, >>> >>> I developed a application that the message size of communication >>> is greater than 10 MB sometimes. >>> For smaller datasets it works fine, but fails for larger datasets. >>> Please check the error message following. >>> >>> I surveyed the situation online and lots of people said >>> the problem can be solved by modifying the property >>> of spark.akka.frameSize >>> and spark.reducer.maxMbInFlight. >>> It may look like: >>> >>> 134 val conf = new SparkConf() >>> 135 .setMaster(master) >>> 136 .setAppName("SparkLR") >>> 137 >>> .setSparkHome("/home/user/spark-0.9.0-incubating-bin-hadoop2") >>> 138 .setJars(List(jarPath)) >>> 139 .set("spark.akka.frameSize", "100") >>> 140 .set("spark.reducer.maxMbInFlight", "100") >>> 141 val sc = new SparkContext(conf) >>> >>> However, the task still fails with the same error message. >>> The communication message is the weight vectors of each sub-problem, >>> it may be larger than 10 MB for higher dimensional dataset. >>> >>> Is there anybody can help me? >>> Thanks a lot. >>> >>> ==== >>> [error] (run-main) org.apache.spark.SparkException: Job aborted: >>> Exception while deserializing and fetching task:*java.lang.OutOfMemoryError: >>> Java heap space* >>> org.apache.spark.SparkException: Job aborted: Exception while >>> deserializing and fetching task: java.lang.OutOfMemoryError: Java heap space >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026) >>> at >>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >>> at >>> org.apache.spark.scheduler.DAGScheduler.org<http://org.apache.spark.scheduler.dagscheduler.org/> >>> $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619) >>> at scala.Option.foreach(Option.scala:236) >>> at >>> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207) >>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) >>> at akka.actor.ActorCell.invoke(ActorCell.scala:456) >>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) >>> at akka.dispatch.Mailbox.run(Mailbox.scala:219) >>> at >>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) >>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>> at >>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >>> at >>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>> at >>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >>> [trace] Stack trace suppressed: run last compile:run for the full output. >>> ==== >>> >>> Chieh-Yen >>> >> >> > > > -- > Thanks > Best Regards >