On Mon, Nov 10, 2014 at 10:52 PM, Ritesh Kumar Singh < riteshoneinamill...@gmail.com> wrote:
> Tasks are now getting submitted, but many tasks don't happen. > Like, after opening the spark-shell, I load a text file from disk and try > printing its contentsas: > > >sc.textFile("/path/to/file").foreach(println) > > It does not give me any output. While running this: > > >sc.textFile("/path/to/file").count > > gives me the right number of lines in the text file. > Not sure what the error is. But here is the output on the console for > print case: > > 14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(215230) called with > curMem=709528, maxMem=463837593 > 14/11/10 22:48:02 INFO MemoryStore: Block broadcast_6 stored as values in > memory (estimated size 210.2 KB, free 441.5 MB) > 14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(17239) called with > curMem=924758, maxMem=463837593 > 14/11/10 22:48:02 INFO MemoryStore: Block broadcast_6_piece0 stored as > bytes in memory (estimated size 16.8 KB, free 441.5 MB) > 14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_6_piece0 in > memory on gonephishing.local:42648 (size: 16.8 KB, free: 442.3 MB) > 14/11/10 22:48:02 INFO BlockManagerMaster: Updated info of block > broadcast_6_piece0 > 14/11/10 22:48:02 INFO FileInputFormat: Total input paths to process : 1 > 14/11/10 22:48:02 INFO SparkContext: Starting job: foreach at <console>:13 > 14/11/10 22:48:02 INFO DAGScheduler: Got job 3 (foreach at <console>:13) > with 2 output partitions (allowLocal=false) > 14/11/10 22:48:02 INFO DAGScheduler: Final stage: Stage 3(foreach at > <console>:13) > 14/11/10 22:48:02 INFO DAGScheduler: Parents of final stage: List() > 14/11/10 22:48:02 INFO DAGScheduler: Missing parents: List() > 14/11/10 22:48:02 INFO DAGScheduler: Submitting Stage 3 (Desktop/mnd.txt > MappedRDD[7] at textFile at <console>:13), which has no missing parents > 14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(2504) called with > curMem=941997, maxMem=463837593 > 14/11/10 22:48:02 INFO MemoryStore: Block broadcast_7 stored as values in > memory (estimated size 2.4 KB, free 441.4 MB) > 14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(1602) called with > curMem=944501, maxMem=463837593 > 14/11/10 22:48:02 INFO MemoryStore: Block broadcast_7_piece0 stored as > bytes in memory (estimated size 1602.0 B, free 441.4 MB) > 14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_7_piece0 in > memory on gonephishing.local:42648 (size: 1602.0 B, free: 442.3 MB) > 14/11/10 22:48:02 INFO BlockManagerMaster: Updated info of block > broadcast_7_piece0 > 14/11/10 22:48:02 INFO DAGScheduler: Submitting 2 missing tasks from Stage > 3 (Desktop/mnd.txt MappedRDD[7] at textFile at <console>:13) > 14/11/10 22:48:02 INFO TaskSchedulerImpl: Adding task set 3.0 with 2 tasks > 14/11/10 22:48:02 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID > 6, gonephishing.local, PROCESS_LOCAL, 1216 bytes) > 14/11/10 22:48:02 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID > 7, gonephishing.local, PROCESS_LOCAL, 1216 bytes) > 14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_7_piece0 in > memory on gonephishing.local:48857 (size: 1602.0 B, free: 442.3 MB) > 14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_6_piece0 in > memory on gonephishing.local:48857 (size: 16.8 KB, free: 442.3 MB) > 14/11/10 22:48:02 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID > 6) in 308 ms on gonephishing.local (1/2) > 14/11/10 22:48:02 INFO DAGScheduler: Stage 3 (foreach at <console>:13) > finished in 0.321 s > 14/11/10 22:48:02 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID > 7) in 315 ms on gonephishing.local (2/2) > 14/11/10 22:48:02 INFO SparkContext: Job finished: foreach at > <console>:13, took 0.376602079 s > 14/11/10 22:48:02 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks > have all completed, from pool > > ======================================================================= > > > > On Mon, Nov 10, 2014 at 8:01 PM, Akhil Das <ak...@sigmoidanalytics.com> > wrote: > >> Try adding the following configurations also, might work. >> >> spark.rdd.compress true >> >> spark.storage.memoryFraction 1 >> spark.core.connection.ack.wait.timeout 600 >> spark.akka.frameSize 50 >> >> Thanks >> Best Regards >> >> On Mon, Nov 10, 2014 at 6:51 PM, Ritesh Kumar Singh < >> riteshoneinamill...@gmail.com> wrote: >> >>> Hi, >>> >>> I am trying to submit my application using spark-submit, using following >>> spark-default.conf params: >>> >>> spark.master spark://<master-ip>:7077 >>> spark.eventLog.enabled true >>> spark.serializer >>> org.apache.spark.serializer.KryoSerializer >>> spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value >>> -Dnumbers="one two three" >>> >>> =============================================================== >>> But every time I am getting this error: >>> >>> 14/11/10 18:39:17 ERROR TaskSchedulerImpl: Lost executor 1 on aa.local: >>> remote Akka client disassociated >>> 14/11/10 18:39:17 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID >>> 1, aa.local): ExecutorLostFailure (executor lost) >>> 14/11/10 18:39:17 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID >>> 0, aa.local): ExecutorLostFailure (executor lost) >>> 14/11/10 18:39:20 ERROR TaskSchedulerImpl: Lost executor 2 on aa.local: >>> remote Akka client disassociated >>> 14/11/10 18:39:20 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID >>> 2, aa.local): ExecutorLostFailure (executor lost) >>> 14/11/10 18:39:20 WARN TaskSetManager: Lost task 1.1 in stage 0.0 (TID >>> 3, aa.local): ExecutorLostFailure (executor lost) >>> 14/11/10 18:39:26 ERROR TaskSchedulerImpl: Lost executor 4 on aa.local: >>> remote Akka client disassociated >>> 14/11/10 18:39:26 WARN TaskSetManager: Lost task 0.2 in stage 0.0 (TID >>> 5, aa.local): ExecutorLostFailure (executor lost) >>> 14/11/10 18:39:26 WARN TaskSetManager: Lost task 1.2 in stage 0.0 (TID >>> 4, aa.local): ExecutorLostFailure (executor lost) >>> 14/11/10 18:39:29 ERROR TaskSchedulerImpl: Lost executor 5 on aa.local: >>> remote Akka client disassociated >>> 14/11/10 18:39:29 WARN TaskSetManager: Lost task 0.3 in stage 0.0 (TID >>> 7, aa.local): ExecutorLostFailure (executor lost) >>> 14/11/10 18:39:29 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 >>> times; aborting job >>> 14/11/10 18:39:29 WARN TaskSetManager: Lost task 1.3 in stage 0.0 (TID >>> 6, aa.local): ExecutorLostFailure (executor lost) >>> Exception in thread "main" org.apache.spark.SparkException: Job aborted >>> due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent >>> failure: Lost task 0.3 in stage 0.0 (TID 7, gonephishing.local): >>> ExecutorLostFailure (executor lost) >>> Driver stacktrace: >>> at org.apache.spark.scheduler.DAGScheduler.org >>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173) >>> at >>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >>> at >>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) >>> at scala.Option.foreach(Option.scala:236) >>> at >>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688) >>> at >>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) >>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) >>> at akka.actor.ActorCell.invoke(ActorCell.scala:456) >>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) >>> at akka.dispatch.Mailbox.run(Mailbox.scala:219) >>> at >>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) >>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>> at >>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >>> at >>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>> at >>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >>> >>> ================================================================= >>> Any fixes? >>> >> >> >