What's your cluster configuration? How you invoke the job?

> I encountered a critical scaling problem using Giraph. I made a very
> simple algorithm to test Giraph on large graphs : a connexity test. It
> works on relatively large graphs (3 072 441 nodes and 117 185 083 edges)
> but not on very large graph (52 000 000 nodes and 2 000 000 000 edges).
> In fact, during the processing of the biggest graph, Giraph core seems to
> fail after the superstep 14 (15 on some jobs). The input graph size is 30
> GB stored as text and the output is also stored as text. 9 working jobs are
> used to compute the graph.
> Here is the tracktrace of jobs (this is the same for the 9 jobs):
>     java.lang.IllegalStateException: run: Caught an unrecoverable
> exception exists: Failed to check
> /_hadoopBsp/job_201307260439_0006/_applicationAttemptsDir/0/_superstepDir/97/_addressesAndPartitions
> after 3 tries!
>         at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Unknown Source)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
>     Caused by: java.lang.IllegalStateException: exists: Failed to check
> /_hadoopBsp/job_201307260439_0006/_applicationAttemptsDir/0/_superstepDir/97/_addressesAndPartitions
> after 3 tries!
>         at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:369)
>         at
> org.apache.giraph.worker.BspServiceWorker.startSuperstep(BspServiceWorker.java:678)
>         at
> org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:248)
>         at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:91)
>         ... 7 more
> Could you help me to solve this problem?
> If you need the code of the program, I can put that here (the code is
> relatively tiny).
