Hi,

It's me again.
After a day's work I've coded a Giraph solution for my problem at hand. I
gave it a run on a medium dataset and it's notably faster than other
approaches.

However the goal is to process larger inputs, for example I've a larger
dataset that the result graph is about 400GB when represented in edge
format and in text file. And I think the edges that the algorithm created
all reside in the cluster's memory. So it means that for this big dataset,
I need a cluster with ~ 400GB main memory to run? Is there any
possibilities that I can output "on the go" that means I don't need to
construct the whole graph, an edge is outputed to HDFS immediately instead
of being created in main memory then be outputed?

Thanks!
-- 
*JU Han*

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
*     **GI06 - Fouille de Données et Décisionnel*

+33 0619608888

Reply via email to