---------- Forwarded message ---------- From: Ritesh Kumar Singh <riteshoneinamill...@gmail.com> Date: Mon, Nov 10, 2014 at 10:52 PM Subject: Re: Executor Lost Failure To: Akhil Das <ak...@sigmoidanalytics.com>
Tasks are now getting submitted, but many tasks don't happen. Like, after opening the spark-shell, I load a text file from disk and try printing its contentsas: >sc.textFile("/path/to/file").foreach(println) It does not give me any output. While running this: >sc.textFile("/path/to/file").count gives me the right number of lines in the text file. Not sure what the error is. But here is the output on the console for print case: 14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(215230) called with curMem=709528, maxMem=463837593 14/11/10 22:48:02 INFO MemoryStore: Block broadcast_6 stored as values in memory (estimated size 210.2 KB, free 441.5 MB) 14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(17239) called with curMem=924758, maxMem=463837593 14/11/10 22:48:02 INFO MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 16.8 KB, free 441.5 MB) 14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on gonephishing.local:42648 (size: 16.8 KB, free: 442.3 MB) 14/11/10 22:48:02 INFO BlockManagerMaster: Updated info of block broadcast_6_piece0 14/11/10 22:48:02 INFO FileInputFormat: Total input paths to process : 1 14/11/10 22:48:02 INFO SparkContext: Starting job: foreach at <console>:13 14/11/10 22:48:02 INFO DAGScheduler: Got job 3 (foreach at <console>:13) with 2 output partitions (allowLocal=false) 14/11/10 22:48:02 INFO DAGScheduler: Final stage: Stage 3(foreach at <console>:13) 14/11/10 22:48:02 INFO DAGScheduler: Parents of final stage: List() 14/11/10 22:48:02 INFO DAGScheduler: Missing parents: List() 14/11/10 22:48:02 INFO DAGScheduler: Submitting Stage 3 (Desktop/mnd.txt MappedRDD[7] at textFile at <console>:13), which has no missing parents 14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(2504) called with curMem=941997, maxMem=463837593 14/11/10 22:48:02 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 2.4 KB, free 441.4 MB) 14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(1602) called with curMem=944501, maxMem=463837593 14/11/10 22:48:02 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 1602.0 B, free 441.4 MB) 14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on gonephishing.local:42648 (size: 1602.0 B, free: 442.3 MB) 14/11/10 22:48:02 INFO BlockManagerMaster: Updated info of block broadcast_7_piece0 14/11/10 22:48:02 INFO DAGScheduler: Submitting 2 missing tasks from Stage 3 (Desktop/mnd.txt MappedRDD[7] at textFile at <console>:13) 14/11/10 22:48:02 INFO TaskSchedulerImpl: Adding task set 3.0 with 2 tasks 14/11/10 22:48:02 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 6, gonephishing.local, PROCESS_LOCAL, 1216 bytes) 14/11/10 22:48:02 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID 7, gonephishing.local, PROCESS_LOCAL, 1216 bytes) 14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on gonephishing.local:48857 (size: 1602.0 B, free: 442.3 MB) 14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on gonephishing.local:48857 (size: 16.8 KB, free: 442.3 MB) 14/11/10 22:48:02 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 6) in 308 ms on gonephishing.local (1/2) 14/11/10 22:48:02 INFO DAGScheduler: Stage 3 (foreach at <console>:13) finished in 0.321 s 14/11/10 22:48:02 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID 7) in 315 ms on gonephishing.local (2/2) 14/11/10 22:48:02 INFO SparkContext: Job finished: foreach at <console>:13, took 0.376602079 s 14/11/10 22:48:02 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool ======================================================================= On Mon, Nov 10, 2014 at 8:01 PM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > Try adding the following configurations also, might work. > > spark.rdd.compress true > > spark.storage.memoryFraction 1 > spark.core.connection.ack.wait.timeout 600 > spark.akka.frameSize 50 > > Thanks > Best Regards > > On Mon, Nov 10, 2014 at 6:51 PM, Ritesh Kumar Singh < > riteshoneinamill...@gmail.com> wrote: > >> Hi, >> >> I am trying to submit my application using spark-submit, using following >> spark-default.conf params: >> >> spark.master spark://<master-ip>:7077 >> spark.eventLog.enabled true >> spark.serializer >> org.apache.spark.serializer.KryoSerializer >> spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value >> -Dnumbers="one two three" >> >> =============================================================== >> But every time I am getting this error: >> >> 14/11/10 18:39:17 ERROR TaskSchedulerImpl: Lost executor 1 on aa.local: >> remote Akka client disassociated >> 14/11/10 18:39:17 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, >> aa.local): ExecutorLostFailure (executor lost) >> 14/11/10 18:39:17 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, >> aa.local): ExecutorLostFailure (executor lost) >> 14/11/10 18:39:20 ERROR TaskSchedulerImpl: Lost executor 2 on aa.local: >> remote Akka client disassociated >> 14/11/10 18:39:20 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 2, >> aa.local): ExecutorLostFailure (executor lost) >> 14/11/10 18:39:20 WARN TaskSetManager: Lost task 1.1 in stage 0.0 (TID 3, >> aa.local): ExecutorLostFailure (executor lost) >> 14/11/10 18:39:26 ERROR TaskSchedulerImpl: Lost executor 4 on aa.local: >> remote Akka client disassociated >> 14/11/10 18:39:26 WARN TaskSetManager: Lost task 0.2 in stage 0.0 (TID 5, >> aa.local): ExecutorLostFailure (executor lost) >> 14/11/10 18:39:26 WARN TaskSetManager: Lost task 1.2 in stage 0.0 (TID 4, >> aa.local): ExecutorLostFailure (executor lost) >> 14/11/10 18:39:29 ERROR TaskSchedulerImpl: Lost executor 5 on aa.local: >> remote Akka client disassociated >> 14/11/10 18:39:29 WARN TaskSetManager: Lost task 0.3 in stage 0.0 (TID 7, >> aa.local): ExecutorLostFailure (executor lost) >> 14/11/10 18:39:29 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 >> times; aborting job >> 14/11/10 18:39:29 WARN TaskSetManager: Lost task 1.3 in stage 0.0 (TID 6, >> aa.local): ExecutorLostFailure (executor lost) >> Exception in thread "main" org.apache.spark.SparkException: Job aborted >> due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent >> failure: Lost task 0.3 in stage 0.0 (TID 7, gonephishing.local): >> ExecutorLostFailure (executor lost) >> Driver stacktrace: >> at org.apache.spark.scheduler.DAGScheduler.org >> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173) >> at >> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >> at >> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) >> at scala.Option.foreach(Option.scala:236) >> at >> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) >> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) >> at akka.actor.ActorCell.invoke(ActorCell.scala:456) >> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) >> at akka.dispatch.Mailbox.run(Mailbox.scala:219) >> at >> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) >> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >> at >> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >> at >> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >> at >> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >> >> ================================================================= >> Any fixes? >> > >