Can anybody help me? Thanks. Chieh-Yen
On Wed, Apr 16, 2014 at 5:18 PM, Chieh-Yen <r01944...@csie.ntu.edu.tw>wrote: > Dear all, > > I developed a application that the message size of communication > is greater than 10 MB sometimes. > For smaller datasets it works fine, but fails for larger datasets. > Please check the error message following. > > I surveyed the situation online and lots of people said > the problem can be solved by modifying the property of spark.akka.frameSize > and spark.reducer.maxMbInFlight. > It may look like: > > 134 val conf = new SparkConf() > 135 .setMaster(master) > 136 .setAppName("SparkLR") > 137 > .setSparkHome("/home/user/spark-0.9.0-incubating-bin-hadoop2") > 138 .setJars(List(jarPath)) > 139 .set("spark.akka.frameSize", "100") > 140 .set("spark.reducer.maxMbInFlight", "100") > 141 val sc = new SparkContext(conf) > > However, the task still fails with the same error message. > The communication message is the weight vectors of each sub-problem, > it may be larger than 10 MB for higher dimensional dataset. > > Is there anybody can help me? > Thanks a lot. > > ==== > [error] (run-main) org.apache.spark.SparkException: Job aborted: Exception > while deserializing and fetching task:*java.lang.OutOfMemoryError: Java > heap space* > org.apache.spark.SparkException: Job aborted: Exception while > deserializing and fetching task: java.lang.OutOfMemoryError: Java heap space > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.org<http://org.apache.spark.scheduler.dagscheduler.org/> > $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > [trace] Stack trace suppressed: run last compile:run for the full output. > ==== > > Chieh-Yen >