Hi Matei, Thanks! Will look out for long GC pauses. On Oct 3, 2013 3:00 PM, "Matei Zaharia" <matei.zaha...@gmail.com> wrote:
> Hi Ashish, > > Those "removing" messages mean that the node in question didn't > communicate with your application for 45 seconds. Most likely the executor > process on the node died, though there's also a chance that it was doing a > super-long garbage collection or that there was a network problem. Look at > the logs on the node to see if it crashed due to an exception or the OS > killed it for some reason. For garbage collection debugging, you can turn > on -XX:+PrintGCDetails in SPARK_JAVA_OPTS to log the lengths of GC pauses. > > Matei > > On Oct 3, 2013, at 1:10 PM, Ashish Rangole <arang...@gmail.com> wrote: > > > Hi, > > > > Trying to figure out what does it mean when the application (driver > program) logs end with the the lines like the ones below. This is with the > application running on Spark 0.8.0 on EC2. > > > > Any help will be greatly appreciated. > > > > Thanks! > > > > > > 13/10/03 16:17:33 INFO cluster.ClusterTaskSetManager: Finished TID 1744 > in 1183507 ms on ip-10-232-80-206.ec2.internal (progress: 46/60) > > 13/10/03 16:17:33 INFO scheduler.DAGScheduler: Completed ResultTask(4, > 20) > > 13/10/03 16:17:57 WARN storage.BlockManagerMasterActor: Removing > BlockManager BlockManagerId(1, ip-10-170-16-83.ec2.internal, 46907, 0) with > no recent > > heart beats: 68685ms exceeds 45000ms > > 13/10/03 16:18:57 WARN storage.BlockManagerMasterActor: Removing > BlockManager BlockManagerId(0, ip-10-232-27-176.ec2.internal, 55654, 0) > with no recent > > heart beats: 95376ms exceeds 45000ms > > 13/10/03 16:19:13 INFO storage.BlockManagerMasterActor$BlockManagerInfo: > Registering block manager ip-10-232-27-176.ec2.internal:55654 with 19.3 GB > RAM > > 13/10/03 16:21:38 INFO storage.BlockManagerMasterActor$BlockManagerInfo: > Added rdd_7_293 in memory on ip-10-232-27-176.ec2.internal:55654 (size: > 1975.7 > > KB, free: 19.2 GB) > > 13/10/03 16:21:38 INFO storage.BlockManagerMasterActor$BlockManagerInfo: > Added rdd_7_293 in memory on ip-10-232-27-176.ec2.internal:55654 (size: > 1975.7 > > KB, free: 19.2 GB) > > 13/10/03 16:22:57 WARN storage.BlockManagerMasterActor: Removing > BlockManager BlockManagerId(0, ip-10-232-27-176.ec2.internal, 55654, 0) > with no recent > > heart beats: 78761ms exceeds 45000ms > > > >