Hi JU,

One thing you can try is to use out-of-core graph (giraph.useOutOfCoreGraph 
option).

I don't know what your exact use case is – do you have the graph which is huge 
or the data which you calculate in your application is? In the second case, 
there is 'giraph.doOutputDuringComputation' option you might want to try out. 
When that is turned on, during each superstep writeVertex will be called 
immediately after compute for that vertex is called. This means that you can 
store data you want to write in vertex, write it and clear the data before 
going to the next vertex.

Maja

From: Han JU <ju.han.fe...@gmail.com<mailto:ju.han.fe...@gmail.com>>
Reply-To: "user@giraph.apache.org<mailto:user@giraph.apache.org>" 
<user@giraph.apache.org<mailto:user@giraph.apache.org>>
Date: Friday, May 17, 2013 8:38 AM
To: "user@giraph.apache.org<mailto:user@giraph.apache.org>" 
<user@giraph.apache.org<mailto:user@giraph.apache.org>>
Subject: What if the resulting graph is larger than the memory?

Hi,

It's me again.
After a day's work I've coded a Giraph solution for my problem at hand. I gave 
it a run on a medium dataset and it's notably faster than other approaches.

However the goal is to process larger inputs, for example I've a larger dataset 
that the result graph is about 400GB when represented in edge format and in 
text file. And I think the edges that the algorithm created all reside in the 
cluster's memory. So it means that for this big dataset, I need a cluster with 
~ 400GB main memory to run? Is there any possibilities that I can output "on 
the go" that means I don't need to construct the whole graph, an edge is 
outputed to HDFS immediately instead of being created in main memory then be 
outputed?

Thanks!
--
JU Han

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
     GI06 - Fouille de Données et Décisionnel

+33 0619608888

Reply via email to