Re: What if the resulting graph is larger than the memory?

Han JU Tue, 21 May 2013 03:10:46 -0700

Hi Sebastian,

It's something like frequent item pairs out of transaction data.
I need all these pairs with somehow a low support (say 2), so the result
could be very big.




2013/5/21 Sebastian Schelter <ssc.o...@googlemail.com>

> Hello Han,
>
> out of curiosity, what do you compute that has such a big result?
>
> Best,
> Sebastian
>
> On 21.05.2013 11:52, Han JU wrote:
> > Hi Maja,
> >
> > The input graph of my problem is not big, the calculation result is very
> > big.
> > In fact what does out-of-core graph mean? Where can I find some examples
> of
> > this and for output during computation?
> >
> > Thanks.
> >
> >
> >
> > 2013/5/17 Maja Kabiljo <majakabi...@fb.com>
> >
> >>  Hi JU,
> >>
> >>  One thing you can try is to use out-of-core graph
> >> (giraph.useOutOfCoreGraph option).
> >>
> >>  I don't know what your exact use case is – do you have the graph which
> >> is huge or the data which you calculate in your application is? In the
> >> second case, there is 'giraph.doOutputDuringComputation' option you
> might
> >> want to try out. When that is turned on, during each superstep
> writeVertex
> >> will be called immediately after compute for that vertex is called. This
> >> means that you can store data you want to write in vertex, write it and
> >> clear the data before going to the next vertex.
> >>
> >>  Maja
> >>
> >>   From: Han JU <ju.han.fe...@gmail.com>
> >> Reply-To: "user@giraph.apache.org" <user@giraph.apache.org>
> >> Date: Friday, May 17, 2013 8:38 AM
> >> To: "user@giraph.apache.org" <user@giraph.apache.org>
> >> Subject: What if the resulting graph is larger than the memory?
> >>
> >>   Hi,
> >>
> >>  It's me again.
> >> After a day's work I've coded a Giraph solution for my problem at hand.
> I
> >> gave it a run on a medium dataset and it's notably faster than other
> >> approaches.
> >>
> >>  However the goal is to process larger inputs, for example I've a larger
> >> dataset that the result graph is about 400GB when represented in edge
> >> format and in text file. And I think the edges that the algorithm
> created
> >> all reside in the cluster's memory. So it means that for this big
> dataset,
> >> I need a cluster with ~ 400GB main memory to run? Is there any
> >> possibilities that I can output "on the go" that means I don't need to
> >> construct the whole graph, an edge is outputed to HDFS immediately
> instead
> >> of being created in main memory then be outputed?
> >>
> >>  Thanks!
> >> --
> >> *JU Han*
> >>
> >>    Software Engineer Intern @ KXEN Inc.
> >>   UTC   -  Université de Technologie de Compiègne
> >>    *     **GI06 - Fouille de Données et Décisionnel*
> >>
> >>  +33 0619608888
> >>
> >
> >
> >
>
>


-- 
*JU Han*

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
*     **GI06 - Fouille de Données et Décisionnel*

+33 0619608888

Re: What if the resulting graph is larger than the memory?

Reply via email to