Hi Yuanyuan,
We haven't tested this feature in a while. But it should work. What did
the job report about why it failed?
Avery
On 3/18/13 10:22 AM, Yuanyuan Tian wrote:
Can anyone help me answer the question?
Yuanyuan
From: Yuanyuan Tian/Almaden/IBM@IBMUS
To: user@giraph.apache.org
Date: 03/15/2013 02:05 PM
Subject: about fault tolerance in Giraph
------------------------------------------------------------------------
Hi
I was testing the fault tolerance of Giraph on a long running job. I
noticed that when one of the worker throw an exception, the whole job
failed without retrying the task, even though I turned on the
checkpointing and there were available map slots in my cluster. Why
wasn't the fault tolerance mechanism working?
I was running a version of Giraph downloaded sometime in June 2012 and
I used Netty for the communication layer.
Thanks,
Yuanyuan