Failover Mechanism in Giraph?

Vincentius Martin Thu, 16 Oct 2014 10:37:57 -0700

Hi,

Recently, I tried to learn Giraph by running RandomMessageBenchmark.


In normal condition, it works just fine. However, when I tried running it
with a slow node in the system, the work just didn't finish. The progress
just went down after it reached 100% map task. After that, it showed me
some errors log like this:

*INFO mapred.JobClient: Task Id : attempt_201410101016_0003_m_*
*000004_0, Status : FAILEDTask attempt_201410101016_0003_m_**000004_0
failed to report status for 600 seconds. Killing!*

So, I'm curious about how failover mechanism works in Giraph? I believe
that it uses checkpoint but I don't know the detail.

Also, I read the source GiraphJob.java. It states that Giraph doesn't use
speculative execution, so what happened when a node in a cluster is
problematic? Does hadoop also redistribute the task to some other workers?

Thanks!

Regards,
Vincentius Martin

Failover Mechanism in Giraph?

Reply via email to