about fault tolerance in Giraph

Yuanyuan Tian Fri, 15 Mar 2013 14:04:35 -0700

Hi

I was testing the fault tolerance of Giraph on a long running job. I 
noticed that when one of the worker throw an exception, the whole job 
failed without retrying the task, even though I turned on the 
checkpointing and there were available map slots in my cluster. Why wasn't 
the fault tolerance mechanism working?


I was running a version of Giraph downloaded sometime in June 2012 and I 
used Netty for the communication layer. 

Thanks,

Yuanyuan

about fault tolerance in Giraph

Reply via email to