how many tasks are there in your job? 发自我的 iPhone
在 2014-4-17,16:24,Qin Wei <wei....@dewmobile.net> 写道: > Hi, Andre, thanks a lot for you reply, but i still get the same exception, > the complete exception message is as below: > > Exception in thread "main" org.apache.spark.SparkException: Job aborted: Task > 1.0:9 failed 4 times (most recent failure: Exception failure: > java.lang.OutOfMemoryError: Java heap space) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > according to your hints,i add SPARK_DRIVER_MEMORY to my spark-env.sh: > export SPARK_MASTER_IP=192.168.2.184 > export SPARK_MASTER_PORT=7077 > export SPARK_LOCAL_IP=192.168.2.183 > export SPARK_DRIVER_MEMORY=10G > export SPARK_JAVA_OPTS="-Xms4g -Xmx40g -XX:MaxPermSize=10g" > > and i modified my code, now i do not call method collect any more, here is my > code: > def main(args: Array[String]) { > val sc = new SparkContext("spark://192.168.2.184:7077", "Score Calcu > Total", "/usr/local/spark-0.9.1-bin-hadoop2", > Seq("/home/deployer/myjar.jar")) > > val mongoRDD = sc.textFile("/home/deployer/uris.dat", 200) > val jsonRDD = mongoRDD.map(arg => new JSONObject(arg)) > val newRDD = jsonRDD.map(arg => { > var score = 0.5 > arg.put("score", score) > arg > }) > > val resourceScoresRDD = newRDD.map(arg => > (arg.get("rid").toString.toLong, (arg.get("zid").toString, > arg.get("score").asInstanceOf[Number].doubleValue))).groupByKey() > > val simRDD = resourceScoresRDD.cartesian(resourceScoresRDD).filter(arg => > arg._1._1 > arg._2._1).map(arg => (arg._1._1, arg._2._1, 0.8)) > > simRDD.saveAsTextFile("/home/deployer/sim") > } > > I ran the program through "java -jar myjar.jar", it crashed quickly, but it > succeed when the size of the data file is small. > > Thanks for your help! > > qinwei > > From: [hidden email] > Date: 2014-04-16 17:50 > To: [hidden email] > Subject: Re: Spark program thows OutOfMemoryError > Seem you have not enough memory on the spark driver. Hints below : > > On 2014-04-15 12:10, Qin Wei wrote: > > val resourcesRDD = jsonRDD.map(arg => > > arg.get("rid").toString.toLong).distinct > > > > // the program crashes at this line of code > > val bcResources = sc.broadcast(resourcesRDD.collect.toList) > what is returned by resources.RDD.count() ? > > > The data file “/home/deployer/uris.dat” is 2G with lines like this : { > > "id" : 1, "a" : { "0" : 1 }, "rid" : 5487628, "zid" : "10550869" } > > > > And here is my spark-env.sh > > export SCALA_HOME=/usr/local/scala-2.10.3 > > export SPARK_MASTER_IP=192.168.2.184 > > export SPARK_MASTER_PORT=7077 > > export SPARK_LOCAL_IP=192.168.2.182 > > export SPARK_WORKER_MEMORY=20g > > export SPARK_MEM=10g > > export SPARK_JAVA_OPTS="-Xms4g -Xmx40g -XX:MaxPermSize=10g > > -XX:-UseGCOverheadLimit" > /try setting SPARK_DRIVER_MEMORY to a bigger value, as default 512m is > probably too small for the resourcesRDD.collect()/ > By the way, are you really sure you need to collect all that ? > > /André Bois-Crettez > > Software Architect > Big Data Developer > http://www.kelkoo.com/ > / > > Kelkoo SAS > Société par Actions Simplifiée > Au capital de � 4.168.964,30 > Siège social : 8, rue du Sentier 75002 Paris > 425 093 069 RCS Paris > > Ce message et les pièces jointes sont confidentiels et établis à l'attention > exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce > message, merci de le détruire et d'en avertir l'expéditeur. > > > If you reply to this email, your message will be added to the discussion > below: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-program-thows-OutOfMemoryError-tp4268p4333.html > To unsubscribe from Spark program thows OutOfMemoryError, click here. > NAML > > View this message in context: Re: Re: Spark program thows OutOfMemoryError > Sent from the Apache Spark User List mailing list archive at Nabble.com.