Hi I tried to run the program WikipediaPageRankStandalone from examples directory of Spark. But am getting the following error.Please help out.
hduser@vm4:~/spark-test/spark-0.7.2$ ./run spark.bagel.examples.WikipediaPageRank pagerank_data.txt 1 3 spark://vm4:7077 true 13/09/05 15:29:11 WARN spark.Utils: Your hostname, vm4 resolves to a loopback address: 127.0.1.1; using 192.168.0.50 instead (on interface eth0) 13/09/05 15:29:11 WARN spark.Utils: Set SPARK_LOCAL_IP if you need to bind to another address 13/09/05 15:29:12 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started 13/09/05 15:29:13 INFO spark.SparkEnv: Registering BlockManagerMaster 13/09/05 15:29:13 INFO storage.MemoryStore: MemoryStore started with capacity 326.7 MB. 13/09/05 15:29:13 INFO storage.DiskStore: Created local directory at /tmp/spark-local-20130905152913-c316 13/09/05 15:29:13 INFO network.ConnectionManager: Bound socket to port 42229 with id = ConnectionManagerId(vm4,42229) 13/09/05 15:29:13 INFO storage.BlockManagerMaster: Trying to register BlockManager 13/09/05 15:29:13 INFO storage.BlockManagerMaster: Registered BlockManager 13/09/05 15:29:13 INFO server.Server: jetty-7.6.8.v20121106 13/09/05 15:29:13 INFO server.AbstractConnector: Started [email protected]:55371 13/09/05 15:29:13 INFO broadcast.HttpBroadcast: Broadcast server started at http://192.168.0.50:55371 13/09/05 15:29:13 INFO spark.SparkEnv: Registering MapOutputTracker 13/09/05 15:29:13 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-7c765df4-09bf-492e-a4e3-df87e4f3c0bc 13/09/05 15:29:13 INFO server.Server: jetty-7.6.8.v20121106 13/09/05 15:29:13 INFO server.AbstractConnector: Started [email protected]:40848 13/09/05 15:29:13 INFO io.IoWorker: IoWorker thread 'spray-io-worker-0' started 13/09/05 15:29:14 INFO server.HttpServer: akka://spark/user/BlockManagerHTTPServer started on /0.0.0.0:39257 13/09/05 15:29:14 INFO storage.BlockManagerUI: Started BlockManager web UI at http://vm4:39257 13/09/05 15:29:14 INFO client.Client$ClientActor: Connecting to master spark://vm4:7077 13/09/05 15:29:15 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20130905152914-0004 13/09/05 15:29:15 INFO client.Client$ClientActor: Executor added: app-20130905152914-0004/0 on worker-20130905152521-vm4-59380 (vm4) with 1 cores 13/09/05 15:29:15 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20130905152914-0004/0 on host vm4 with 1 cores, 512.0 MB RAM 13/09/05 15:29:15 INFO client.Client$ClientActor: Executor updated: app-20130905152914-0004/0 is now RUNNING 13/09/05 15:29:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 13/09/05 15:29:17 INFO storage.MemoryStore: ensureFreeSpace(123002) called with curMem=0, maxMem=342526525 13/09/05 15:29:17 INFO storage.MemoryStore: Block broadcast_0 stored as values to memory (estimated size 120.1 KB, free 326.5 MB) 13/09/05 15:29:18 INFO spark.KryoSerializer: Running user registrator: spark.bagel.examples.PRKryoRegistrator Counting vertices... 13/09/05 15:29:18 INFO mapred.FileInputFormat: Total input paths to process : 1 13/09/05 15:29:18 INFO spark.SparkContext: Starting job: main at <unknown>:0 13/09/05 15:29:18 INFO scheduler.DAGScheduler: Got job 0 (main at <unknown>:0) with 2 output partitions (allowLocal=false) 13/09/05 15:29:18 INFO scheduler.DAGScheduler: Final stage: Stage 0 (main at <unknown>:0) 13/09/05 15:29:18 INFO scheduler.DAGScheduler: Parents of final stage: List() 13/09/05 15:29:18 INFO scheduler.DAGScheduler: Missing parents: List() 13/09/05 15:29:18 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[1] at main at <unknown>:0), which has no missing parents 13/09/05 15:29:18 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[1] at main at <unknown>:0) 13/09/05 15:29:18 INFO cluster.ClusterScheduler: Adding task set 0.0 with 2 tasks 13/09/05 15:29:19 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka://sparkExecutor@vm4:46884/user/Executor] with ID 0 13/09/05 15:29:19 INFO cluster.TaskSetManager: Starting task 0.0:0 as TID 0 on executor 0: vm4 (preferred) 13/09/05 15:29:19 INFO cluster.TaskSetManager: Serialized task 0.0:0 as 1477 bytes in 70 ms 13/09/05 15:29:19 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager vm4:59614 with 326.7 MB RAM 13/09/05 15:29:22 INFO cluster.TaskSetManager: Finished TID 0 in 2583 ms (progress: 1/2) 13/09/05 15:29:22 INFO scheduler.DAGScheduler: Completed ResultTask(0, 0) 13/09/05 15:29:22 INFO cluster.TaskSetManager: Starting task 0.0:1 as TID 1 on executor 0: vm4 (preferred) 13/09/05 15:29:22 INFO cluster.TaskSetManager: Serialized task 0.0:1 as 1477 bytes in 0 ms 13/09/05 15:29:22 INFO cluster.TaskSetManager: Finished TID 1 in 120 ms (progress: 2/2) 13/09/05 15:29:22 INFO scheduler.DAGScheduler: Completed ResultTask(0, 1) 13/09/05 15:29:22 INFO scheduler.DAGScheduler: Stage 0 (main at <unknown>:0) finished in 3.750 s 13/09/05 15:29:22 INFO spark.SparkContext: Job finished: main at <unknown>:0, took 3.848063895 s Done counting vertices. Parsing input file... Done parsing input file. 13/09/05 15:29:22 INFO bagel.Bagel: Starting superstep 0. 13/09/05 15:29:22 INFO rdd.CoGroupedRDD: Adding one-to-one dependency with MapPartitionsRDD[7] at main at <unknown>:0 13/09/05 15:29:22 INFO rdd.CoGroupedRDD: Adding shuffle dependency with ShuffledRDD[3] at main at <unknown>:0 13/09/05 15:29:22 INFO spark.SparkContext: Starting job: main at <unknown>:0 13/09/05 15:29:22 INFO scheduler.DAGScheduler: Registering RDD 5 (main at <unknown>:0) 13/09/05 15:29:22 INFO scheduler.DAGScheduler: Registering RDD 11 (apply at TraversableLike.scala:233) 13/09/05 15:29:22 INFO scheduler.DAGScheduler: Registering RDD 2 (main at <unknown>:0) 13/09/05 15:29:22 INFO scheduler.DAGScheduler: Got job 1 (main at <unknown>:0) with 3 output partitions (allowLocal=false) 13/09/05 15:29:22 INFO scheduler.DAGScheduler: Final stage: Stage 1 (main at <unknown>:0) 13/09/05 15:29:22 INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 2, Stage 3) 13/09/05 15:29:22 INFO scheduler.DAGScheduler: Missing parents: List(Stage 2, Stage 3) 13/09/05 15:29:22 INFO scheduler.DAGScheduler: Submitting Stage 2 (MapPartitionsRDD[5] at main at <unknown>:0), which has no missing parents 13/09/05 15:29:22 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 2 (MapPartitionsRDD[5] at main at <unknown>:0) 13/09/05 15:29:22 INFO cluster.ClusterScheduler: Adding task set 2.0 with 2 tasks 13/09/05 15:29:22 INFO cluster.TaskSetManager: Starting task 2.0:0 as TID 2 on executor 0: vm4 (preferred) 13/09/05 15:29:22 INFO scheduler.DAGScheduler: Submitting Stage 4 (MappedRDD[2] at main at <unknown>:0), which has no missing parents 13/09/05 15:29:22 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 4 (MappedRDD[2] at main at <unknown>:0) 13/09/05 15:29:22 INFO cluster.ClusterScheduler: Adding task set 4.0 with 2 tasks 13/09/05 15:29:22 INFO cluster.TaskSetManager: Serialized task 2.0:0 as 1713 bytes in 54 ms 13/09/05 15:29:22 INFO cluster.TaskSetManager: Finished TID 2 in 193 ms (progress: 1/2) 13/09/05 15:29:22 INFO scheduler.DAGScheduler: Completed ShuffleMapTask(2, 0) 13/09/05 15:29:22 INFO cluster.TaskSetManager: Starting task 2.0:1 as TID 3 on executor 0: vm4 (preferred) 13/09/05 15:29:22 INFO cluster.TaskSetManager: Serialized task 2.0:1 as 1713 bytes in 0 ms 13/09/05 15:29:22 INFO cluster.TaskSetManager: Finished TID 3 in 51 ms (progress: 2/2) 13/09/05 15:29:22 INFO cluster.TaskSetManager: Starting task 4.0:0 as TID 4 on executor 0: vm4 (preferred) 13/09/05 15:29:22 INFO scheduler.DAGScheduler: Completed ShuffleMapTask(2, 1) 13/09/05 15:29:22 INFO scheduler.DAGScheduler: Stage 2 (main at <unknown>:0) finished in 0.261 s 13/09/05 15:29:22 INFO scheduler.DAGScheduler: looking for newly runnable stages 13/09/05 15:29:22 INFO scheduler.DAGScheduler: running: Set(Stage 4) 13/09/05 15:29:22 INFO scheduler.DAGScheduler: waiting: Set(Stage 1, Stage 3) 13/09/05 15:29:22 INFO scheduler.DAGScheduler: failed: Set() 13/09/05 15:29:22 INFO cluster.TaskSetManager: Serialized task 4.0:0 as 1676 bytes in 22 ms 13/09/05 15:29:22 INFO scheduler.DAGScheduler: Missing parents for Stage 1: List(Stage 3) 13/09/05 15:29:22 INFO scheduler.DAGScheduler: Missing parents for Stage 3: List(Stage 4) 13/09/05 15:29:22 INFO cluster.TaskSetManager: Lost TID 4 (task 4.0:0) 13/09/05 15:29:22 INFO cluster.TaskSetManager: Loss was due to java.lang.ArrayIndexOutOfBoundsException: 3 at spark.bagel.examples.WikipediaPageRank$$anonfun$1.apply(WikipediaPageRank.scala:43) at spark.bagel.examples.WikipediaPageRank$$anonfun$1.apply(WikipediaPageRank.scala:41) at scala.collection.Iterator$$anon$19.next(Iterator.scala:401) at scala.collection.Iterator$class.foreach(Iterator.scala:772) at scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399) at spark.scheduler.ShuffleMapTask.run(ShuffleMapTask.scala:127) at spark.scheduler.ShuffleMapTask.run(ShuffleMapTask.scala:75) at spark.executor.Executor$TaskRunner.run(Executor.scala:98) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 13/09/05 15:29:22 INFO cluster.TaskSetManager: Starting task 4.0:0 as TID 5 on executor 0: vm4 (preferred) 13/09/05 15:29:22 INFO cluster.TaskSetManager: Serialized task 4.0:0 as 1676 bytes in 0 ms 13/09/05 15:29:22 INFO cluster.TaskSetManager: Lost TID 5 (task 4.0:0) 13/09/05 15:29:22 INFO cluster.TaskSetManager: Loss was due to java.lang.ArrayIndexOutOfBoundsException: 3 [duplicate 1] 13/09/05 15:29:22 INFO cluster.TaskSetManager: Starting task 4.0:0 as TID 6 on executor 0: vm4 (preferred) 13/09/05 15:29:23 INFO cluster.TaskSetManager: Serialized task 4.0:0 as 1676 bytes in 7 ms 13/09/05 15:29:23 INFO cluster.TaskSetManager: Lost TID 6 (task 4.0:0) 13/09/05 15:29:23 INFO cluster.TaskSetManager: Loss was due to java.lang.ArrayIndexOutOfBoundsException: 3 [duplicate 2] 13/09/05 15:29:23 INFO cluster.TaskSetManager: Starting task 4.0:0 as TID 7 on executor 0: vm4 (preferred) 13/09/05 15:29:23 INFO cluster.TaskSetManager: Serialized task 4.0:0 as 1676 bytes in 0 ms 13/09/05 15:29:23 INFO cluster.TaskSetManager: Lost TID 7 (task 4.0:0) 13/09/05 15:29:23 INFO cluster.TaskSetManager: Loss was due to java.lang.ArrayIndexOutOfBoundsException: 3 [duplicate 3] 13/09/05 15:29:23 INFO cluster.TaskSetManager: Starting task 4.0:0 as TID 8 on executor 0: vm4 (preferred) 13/09/05 15:29:23 INFO cluster.TaskSetManager: Serialized task 4.0:0 as 1676 bytes in 0 ms 13/09/05 15:29:23 INFO cluster.TaskSetManager: Lost TID 8 (task 4.0:0) 13/09/05 15:29:23 INFO cluster.TaskSetManager: Loss was due to java.lang.ArrayIndexOutOfBoundsException: 3 [duplicate 4] 13/09/05 15:29:23 ERROR cluster.TaskSetManager: Task 4.0:0 failed more than 4 times; aborting job 13/09/05 15:29:23 INFO scheduler.DAGScheduler: Failed to run main at <unknown>:0 Exception in thread "main" spark.SparkException: Job failed: Task 4.0:0 failed more than 4 times at spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:642) at spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:640) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:640) at spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:303) at spark.scheduler.DAGScheduler.spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:364) at spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:107)
