Hi Yuanyuan,

We haven't tested this feature in a while. But it should work. What did the job report about why it failed?

Avery

On 3/18/13 10:22 AM, Yuanyuan Tian wrote:
Can anyone help me answer the question?

Yuanyuan



From: Yuanyuan Tian/Almaden/IBM@IBMUS
To: user@giraph.apache.org
Date: 03/15/2013 02:05 PM
Subject: about fault tolerance in Giraph
------------------------------------------------------------------------



Hi

I was testing the fault tolerance of Giraph on a long running job. I noticed that when one of the worker throw an exception, the whole job failed without retrying the task, even though I turned on the checkpointing and there were available map slots in my cluster. Why wasn't the fault tolerance mechanism working?

I was running a version of Giraph downloaded sometime in June 2012 and I used Netty for the communication layer.

Thanks,

Yuanyuan

Reply via email to