It simply means that not all partitions of the graph are in-memory all the time. If you don't have enugh memory, some of them might get spilled to disk.
On 21.05.2013 14:16, Han JU wrote: > Thanks, that's a good point. > But for the moment I just want to try out different solutions on hadoop and > have a comparison of them. So I'd like to see how they perform under > general conditions. > > Do you happen to know what out-of-core graph means? > > Thanks. > > > 2013/5/21 Sebastian Schelter <ssc.o...@googlemail.com> > >> Ah, I see. I have worked on similar things in recommender systems. Here >> the problem is generally that you get a result quadratic to the number >> of interactions per item. If you have some topsellers in your data, >> those might make up for the large result. It helps very much to throw >> out the few most popular items (if your application allows that). >> >> Best, >> Sebastian >> >> >> On 21.05.2013 12:10, Han JU wrote: >>> Hi Sebastian, >>> >>> It's something like frequent item pairs out of transaction data. >>> I need all these pairs with somehow a low support (say 2), so the result >>> could be very big. >>> >>> >>> >>> 2013/5/21 Sebastian Schelter <ssc.o...@googlemail.com> >>> >>>> Hello Han, >>>> >>>> out of curiosity, what do you compute that has such a big result? >>>> >>>> Best, >>>> Sebastian >>>> >>>> On 21.05.2013 11:52, Han JU wrote: >>>>> Hi Maja, >>>>> >>>>> The input graph of my problem is not big, the calculation result is >> very >>>>> big. >>>>> In fact what does out-of-core graph mean? Where can I find some >> examples >>>> of >>>>> this and for output during computation? >>>>> >>>>> Thanks. >>>>> >>>>> >>>>> >>>>> 2013/5/17 Maja Kabiljo <majakabi...@fb.com> >>>>> >>>>>> Hi JU, >>>>>> >>>>>> One thing you can try is to use out-of-core graph >>>>>> (giraph.useOutOfCoreGraph option). >>>>>> >>>>>> I don't know what your exact use case is – do you have the graph >> which >>>>>> is huge or the data which you calculate in your application is? In the >>>>>> second case, there is 'giraph.doOutputDuringComputation' option you >>>> might >>>>>> want to try out. When that is turned on, during each superstep >>>> writeVertex >>>>>> will be called immediately after compute for that vertex is called. >> This >>>>>> means that you can store data you want to write in vertex, write it >> and >>>>>> clear the data before going to the next vertex. >>>>>> >>>>>> Maja >>>>>> >>>>>> From: Han JU <ju.han.fe...@gmail.com> >>>>>> Reply-To: "user@giraph.apache.org" <user@giraph.apache.org> >>>>>> Date: Friday, May 17, 2013 8:38 AM >>>>>> To: "user@giraph.apache.org" <user@giraph.apache.org> >>>>>> Subject: What if the resulting graph is larger than the memory? >>>>>> >>>>>> Hi, >>>>>> >>>>>> It's me again. >>>>>> After a day's work I've coded a Giraph solution for my problem at >> hand. >>>> I >>>>>> gave it a run on a medium dataset and it's notably faster than other >>>>>> approaches. >>>>>> >>>>>> However the goal is to process larger inputs, for example I've a >> larger >>>>>> dataset that the result graph is about 400GB when represented in edge >>>>>> format and in text file. And I think the edges that the algorithm >>>> created >>>>>> all reside in the cluster's memory. So it means that for this big >>>> dataset, >>>>>> I need a cluster with ~ 400GB main memory to run? Is there any >>>>>> possibilities that I can output "on the go" that means I don't need to >>>>>> construct the whole graph, an edge is outputed to HDFS immediately >>>> instead >>>>>> of being created in main memory then be outputed? >>>>>> >>>>>> Thanks! >>>>>> -- >>>>>> *JU Han* >>>>>> >>>>>> Software Engineer Intern @ KXEN Inc. >>>>>> UTC - Université de Technologie de Compiègne >>>>>> * **GI06 - Fouille de Données et Décisionnel* >>>>>> >>>>>> +33 0619608888 >>>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> > >