Hi All,

  We have set up 2 node cluster (NODE-DSRV05 and NODE-DSRV02) each is
having 32gb RAM and 1 TB hard disk capacity and 8 cores of cpu. We have set
up hdfs which has 2 TB capacity and the block size is 256 mb   When we try
to process 1 gb file on spark, we see the following exception

14/11/14 17:01:42 INFO scheduler.TaskSetManager: Starting task 0.0 in stage
0.0 (TID 0, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
14/11/14 17:01:42 INFO scheduler.TaskSetManager: Starting task 1.0 in stage
0.0 (TID 1, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
14/11/14 17:01:42 INFO scheduler.TaskSetManager: Starting task 2.0 in stage
0.0 (TID 2, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
14/11/14 17:01:43 INFO cluster.SparkDeploySchedulerBackend: Registered
executor: 
Actor[akka.tcp://sparkExecutor@IMPETUS-DSRV02:41124/user/Executor#539551156]
with ID 0
14/11/14 17:01:43 INFO storage.BlockManagerMasterActor: Registering block
manager NODE-DSRV05.impetus.co.in:60432 with 2.1 GB RAM
14/11/14 17:01:43 INFO storage.BlockManagerMasterActor: Registering block
manager NODE-DSRV02:47844 with 2.1 GB RAM
14/11/14 17:01:43 INFO network.ConnectionManager: Accepted connection from [
NODE-DSRV05.impetus.co.in/192.168.145.195:51447]
14/11/14 17:01:43 INFO network.SendingConnection: Initiating connection to [
NODE-DSRV05.impetus.co.in/192.168.145.195:60432]
14/11/14 17:01:43 INFO network.SendingConnection: Connected to [
NODE-DSRV05.impetus.co.in/192.168.145.195:60432], 1 messages pending
14/11/14 17:01:43 INFO storage.BlockManagerInfo: Added broadcast_1_piece0
in memory on NODE-DSRV05.impetus.co.in:60432 (size: 17.1 KB, free: 2.1 GB)
14/11/14 17:01:43 INFO storage.BlockManagerInfo: Added broadcast_0_piece0
in memory on NODE-DSRV05.impetus.co.in:60432 (size: 14.1 KB, free: 2.1 GB)
14/11/14 17:01:44 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0
(TID 0, NODE-DSRV05.impetus.co.in): java.lang.NullPointerException:
        org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609)
        org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609)

org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
        org.apache.spark.scheduler.Task.run(Task.scala:54)

org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        java.lang.Thread.run(Thread.java:722)
14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 0.1 in stage
0.0 (TID 3, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 1.0 in stage 0.0
(TID 1) on executor NODE-DSRV05.impetus.co.in:
java.lang.NullPointerException (null) [duplicate 1]
14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 2.0 in stage 0.0
(TID 2) on executor NODE-DSRV05.impetus.co.in:
java.lang.NullPointerException (null) [duplicate 2]
14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 2.1 in stage
0.0 (TID 4, NODE-DSRV05.impetus.co.in, NODE_LOCAL, 1667 bytes)
14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 1.1 in stage
0.0 (TID 5, NODE-DSRV02, NODE_LOCAL, 1667 bytes)
14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 0.1 in stage 0.0
(TID 3) on executor NODE-DSRV05.impetus.co.in:
java.lang.NullPointerException (null) [duplicate 3]
14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 0.2 in stage
0.0 (TID 6, NODE-DSRV02, NODE_LOCAL, 1667 bytes)
14/11/14 17:01:44 INFO scheduler.TaskSetManager: Lost task 2.1 in stage 0.0
(TID 4) on executor NODE-DSRV05.impetus.co.in:
java.lang.NullPointerException (null) [duplicate 4]
14/11/14 17:01:44 INFO scheduler.TaskSetManager: Starting task 2.2 in stage
0.0 (TID 7, NODE-DSRV02, NODE_LOCAL, 1667 bytes)


What I see is, it couldnt launch tasks on NODE-DSRV05 and processing it on
single node i.e NODE-DSRV02. When we tried with 360 MB of data, I dont see
any exception but the entire processing is done by only one node. I couldnt
figure out where the issue lies.

Any suggestions on what kind of situations might cause such issue ?

Thanks,
Padma Ch

Reply via email to