Hi Folks,
            I am a running a giraph job for a connected component
computation with 2B vertices all of type LongWritable. The edges are of
type NullWritables. Doing a quick calculation it show 2B vertices will
account for around 120 GB in Total. Now I have a cluster with 150 mappers ,
each mapper is allocated 4GB. So the total memory is around 600GB. But
inspite of this I am getting out of memory issues while runnig the
connected component computation. But it works for a partition of size
around 1B smoothly. I am not able able to convince myself that giraph is so
much memory consuming, is there anything I can do here to finish this task.
1. buffering intermediate results in memory might be one of the reasons,
can it be stored in hdfs instead of memory, which flag is it ? outofcore
stores in local disk and dont want to store locally.
2. whats the best way to profile the job and check which objects are the
culprits ?

Thanks in adv for response.

-- 
Best Regards,
Jyotirmoy Sundi

Reply via email to