Hi Folks, I am a running a giraph job for a connected component computation with 2B vertices all of type LongWritable. The edges are of type NullWritables. Doing a quick calculation it show 2B vertices will account for around 120 GB in Total. Now I have a cluster with 150 mappers , each mapper is allocated 4GB. So the total memory is around 600GB. But inspite of this I am getting out of memory issues while runnig the connected component computation. But it works for a partition of size around 1B smoothly. I am not able able to convince myself that giraph is so much memory consuming, is there anything I can do here to finish this task. 1. buffering intermediate results in memory might be one of the reasons, can it be stored in hdfs instead of memory, which flag is it ? outofcore stores in local disk and dont want to store locally. 2. whats the best way to profile the job and check which objects are the culprits ?
Thanks in adv for response. -- Best Regards, Jyotirmoy Sundi