Hi Sebastian, It's something like frequent item pairs out of transaction data. I need all these pairs with somehow a low support (say 2), so the result could be very big.
2013/5/21 Sebastian Schelter <ssc.o...@googlemail.com> > Hello Han, > > out of curiosity, what do you compute that has such a big result? > > Best, > Sebastian > > On 21.05.2013 11:52, Han JU wrote: > > Hi Maja, > > > > The input graph of my problem is not big, the calculation result is very > > big. > > In fact what does out-of-core graph mean? Where can I find some examples > of > > this and for output during computation? > > > > Thanks. > > > > > > > > 2013/5/17 Maja Kabiljo <majakabi...@fb.com> > > > >> Hi JU, > >> > >> One thing you can try is to use out-of-core graph > >> (giraph.useOutOfCoreGraph option). > >> > >> I don't know what your exact use case is – do you have the graph which > >> is huge or the data which you calculate in your application is? In the > >> second case, there is 'giraph.doOutputDuringComputation' option you > might > >> want to try out. When that is turned on, during each superstep > writeVertex > >> will be called immediately after compute for that vertex is called. This > >> means that you can store data you want to write in vertex, write it and > >> clear the data before going to the next vertex. > >> > >> Maja > >> > >> From: Han JU <ju.han.fe...@gmail.com> > >> Reply-To: "user@giraph.apache.org" <user@giraph.apache.org> > >> Date: Friday, May 17, 2013 8:38 AM > >> To: "user@giraph.apache.org" <user@giraph.apache.org> > >> Subject: What if the resulting graph is larger than the memory? > >> > >> Hi, > >> > >> It's me again. > >> After a day's work I've coded a Giraph solution for my problem at hand. > I > >> gave it a run on a medium dataset and it's notably faster than other > >> approaches. > >> > >> However the goal is to process larger inputs, for example I've a larger > >> dataset that the result graph is about 400GB when represented in edge > >> format and in text file. And I think the edges that the algorithm > created > >> all reside in the cluster's memory. So it means that for this big > dataset, > >> I need a cluster with ~ 400GB main memory to run? Is there any > >> possibilities that I can output "on the go" that means I don't need to > >> construct the whole graph, an edge is outputed to HDFS immediately > instead > >> of being created in main memory then be outputed? > >> > >> Thanks! > >> -- > >> *JU Han* > >> > >> Software Engineer Intern @ KXEN Inc. > >> UTC - Université de Technologie de Compiègne > >> * **GI06 - Fouille de Données et Décisionnel* > >> > >> +33 0619608888 > >> > > > > > > > > -- *JU Han* Software Engineer Intern @ KXEN Inc. UTC - Université de Technologie de Compiègne * **GI06 - Fouille de Données et Décisionnel* +33 0619608888