Hi JU, One thing you can try is to use out-of-core graph (giraph.useOutOfCoreGraph option).
I don't know what your exact use case is – do you have the graph which is huge or the data which you calculate in your application is? In the second case, there is 'giraph.doOutputDuringComputation' option you might want to try out. When that is turned on, during each superstep writeVertex will be called immediately after compute for that vertex is called. This means that you can store data you want to write in vertex, write it and clear the data before going to the next vertex. Maja From: Han JU <ju.han.fe...@gmail.com<mailto:ju.han.fe...@gmail.com>> Reply-To: "user@giraph.apache.org<mailto:user@giraph.apache.org>" <user@giraph.apache.org<mailto:user@giraph.apache.org>> Date: Friday, May 17, 2013 8:38 AM To: "user@giraph.apache.org<mailto:user@giraph.apache.org>" <user@giraph.apache.org<mailto:user@giraph.apache.org>> Subject: What if the resulting graph is larger than the memory? Hi, It's me again. After a day's work I've coded a Giraph solution for my problem at hand. I gave it a run on a medium dataset and it's notably faster than other approaches. However the goal is to process larger inputs, for example I've a larger dataset that the result graph is about 400GB when represented in edge format and in text file. And I think the edges that the algorithm created all reside in the cluster's memory. So it means that for this big dataset, I need a cluster with ~ 400GB main memory to run? Is there any possibilities that I can output "on the go" that means I don't need to construct the whole graph, an edge is outputed to HDFS immediately instead of being created in main memory then be outputed? Thanks! -- JU Han Software Engineer Intern @ KXEN Inc. UTC - Université de Technologie de Compiègne GI06 - Fouille de Données et Décisionnel +33 0619608888