Hi, Recently, I tried to learn Giraph by running RandomMessageBenchmark.
In normal condition, it works just fine. However, when I tried running it with a slow node in the system, the work just didn't finish. The progress just went down after it reached 100% map task. After that, it showed me some errors log like this: *INFO mapred.JobClient: Task Id : attempt_201410101016_0003_m_* *000004_0, Status : FAILEDTask attempt_201410101016_0003_m_**000004_0 failed to report status for 600 seconds. Killing!* So, I'm curious about how failover mechanism works in Giraph? I believe that it uses checkpoint but I don't know the detail. Also, I read the source GiraphJob.java. It states that Giraph doesn't use speculative execution, so what happened when a node in a cluster is problematic? Does hadoop also redistribute the task to some other workers? Thanks! Regards, Vincentius Martin