Hi Ashish,

Those "removing" messages mean that the node in question didn't communicate 
with your application for 45 seconds. Most likely the executor process on the 
node died, though there's also a chance that it was doing a super-long garbage 
collection or that there was a network problem. Look at the logs on the node to 
see if it crashed due to an exception or the OS killed it for some reason. For 
garbage collection debugging, you can turn on -XX:+PrintGCDetails in 
SPARK_JAVA_OPTS to log the lengths of GC pauses.

Matei

On Oct 3, 2013, at 1:10 PM, Ashish Rangole <arang...@gmail.com> wrote:

> Hi,
> 
> Trying to figure out what does it mean when the application (driver program) 
> logs end with the the lines like the ones below. This is with the application 
> running on Spark 0.8.0 on EC2.
> 
> Any help will be greatly appreciated.
> 
> Thanks!
> 
> 
> 13/10/03 16:17:33 INFO cluster.ClusterTaskSetManager: Finished TID 1744 in 
> 1183507 ms on ip-10-232-80-206.ec2.internal (progress: 46/60)
> 13/10/03 16:17:33 INFO scheduler.DAGScheduler: Completed ResultTask(4, 20)
> 13/10/03 16:17:57 WARN storage.BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(1, ip-10-170-16-83.ec2.internal, 46907, 0) with no recent
> heart beats: 68685ms exceeds 45000ms
> 13/10/03 16:18:57 WARN storage.BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(0, ip-10-232-27-176.ec2.internal, 55654, 0) with no recent
>  heart beats: 95376ms exceeds 45000ms
> 13/10/03 16:19:13 INFO storage.BlockManagerMasterActor$BlockManagerInfo: 
> Registering block manager ip-10-232-27-176.ec2.internal:55654 with 19.3 GB RAM
> 13/10/03 16:21:38 INFO storage.BlockManagerMasterActor$BlockManagerInfo: 
> Added rdd_7_293 in memory on ip-10-232-27-176.ec2.internal:55654 (size: 1975.7
>  KB, free: 19.2 GB)
> 13/10/03 16:21:38 INFO storage.BlockManagerMasterActor$BlockManagerInfo: 
> Added rdd_7_293 in memory on ip-10-232-27-176.ec2.internal:55654 (size: 1975.7
>  KB, free: 19.2 GB)
> 13/10/03 16:22:57 WARN storage.BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(0, ip-10-232-27-176.ec2.internal, 55654, 0) with no recent
>  heart beats: 78761ms exceeds 45000ms
> 

Reply via email to