Thanks, that's a good point. But for the moment I just want to try out different solutions on hadoop and have a comparison of them. So I'd like to see how they perform under general conditions.
Do you happen to know what out-of-core graph means? Thanks. 2013/5/21 Sebastian Schelter <ssc.o...@googlemail.com> > Ah, I see. I have worked on similar things in recommender systems. Here > the problem is generally that you get a result quadratic to the number > of interactions per item. If you have some topsellers in your data, > those might make up for the large result. It helps very much to throw > out the few most popular items (if your application allows that). > > Best, > Sebastian > > > On 21.05.2013 12:10, Han JU wrote: > > Hi Sebastian, > > > > It's something like frequent item pairs out of transaction data. > > I need all these pairs with somehow a low support (say 2), so the result > > could be very big. > > > > > > > > 2013/5/21 Sebastian Schelter <ssc.o...@googlemail.com> > > > >> Hello Han, > >> > >> out of curiosity, what do you compute that has such a big result? > >> > >> Best, > >> Sebastian > >> > >> On 21.05.2013 11:52, Han JU wrote: > >>> Hi Maja, > >>> > >>> The input graph of my problem is not big, the calculation result is > very > >>> big. > >>> In fact what does out-of-core graph mean? Where can I find some > examples > >> of > >>> this and for output during computation? > >>> > >>> Thanks. > >>> > >>> > >>> > >>> 2013/5/17 Maja Kabiljo <majakabi...@fb.com> > >>> > >>>> Hi JU, > >>>> > >>>> One thing you can try is to use out-of-core graph > >>>> (giraph.useOutOfCoreGraph option). > >>>> > >>>> I don't know what your exact use case is – do you have the graph > which > >>>> is huge or the data which you calculate in your application is? In the > >>>> second case, there is 'giraph.doOutputDuringComputation' option you > >> might > >>>> want to try out. When that is turned on, during each superstep > >> writeVertex > >>>> will be called immediately after compute for that vertex is called. > This > >>>> means that you can store data you want to write in vertex, write it > and > >>>> clear the data before going to the next vertex. > >>>> > >>>> Maja > >>>> > >>>> From: Han JU <ju.han.fe...@gmail.com> > >>>> Reply-To: "user@giraph.apache.org" <user@giraph.apache.org> > >>>> Date: Friday, May 17, 2013 8:38 AM > >>>> To: "user@giraph.apache.org" <user@giraph.apache.org> > >>>> Subject: What if the resulting graph is larger than the memory? > >>>> > >>>> Hi, > >>>> > >>>> It's me again. > >>>> After a day's work I've coded a Giraph solution for my problem at > hand. > >> I > >>>> gave it a run on a medium dataset and it's notably faster than other > >>>> approaches. > >>>> > >>>> However the goal is to process larger inputs, for example I've a > larger > >>>> dataset that the result graph is about 400GB when represented in edge > >>>> format and in text file. And I think the edges that the algorithm > >> created > >>>> all reside in the cluster's memory. So it means that for this big > >> dataset, > >>>> I need a cluster with ~ 400GB main memory to run? Is there any > >>>> possibilities that I can output "on the go" that means I don't need to > >>>> construct the whole graph, an edge is outputed to HDFS immediately > >> instead > >>>> of being created in main memory then be outputed? > >>>> > >>>> Thanks! > >>>> -- > >>>> *JU Han* > >>>> > >>>> Software Engineer Intern @ KXEN Inc. > >>>> UTC - Université de Technologie de Compiègne > >>>> * **GI06 - Fouille de Données et Décisionnel* > >>>> > >>>> +33 0619608888 > >>>> > >>> > >>> > >>> > >> > >> > > > > > > -- *JU Han* Software Engineer Intern @ KXEN Inc. UTC - Université de Technologie de Compiègne * **GI06 - Fouille de Données et Décisionnel* +33 0619608888