Re: What if the resulting graph is larger than the memory?

Han JU Tue, 21 May 2013 05:17:23 -0700

Thanks, that's a good point.
But for the moment I just want to try out different solutions on hadoop and
have a comparison of them. So I'd like to see how they perform under
general conditions.


Do you happen to know what out-of-core graph means?

Thanks.


2013/5/21 Sebastian Schelter <ssc.o...@googlemail.com>

> Ah, I see. I have worked on similar things in recommender systems. Here
> the problem is generally that you get a result quadratic to the number
> of interactions per item. If you have some topsellers in your data,
> those might make up for the large result. It helps very much to throw
> out the few most popular items (if your application allows that).
>
> Best,
> Sebastian
>
>
> On 21.05.2013 12:10, Han JU wrote:
> > Hi Sebastian,
> >
> > It's something like frequent item pairs out of transaction data.
> > I need all these pairs with somehow a low support (say 2), so the result
> > could be very big.
> >
> >
> >
> > 2013/5/21 Sebastian Schelter <ssc.o...@googlemail.com>
> >
> >> Hello Han,
> >>
> >> out of curiosity, what do you compute that has such a big result?
> >>
> >> Best,
> >> Sebastian
> >>
> >> On 21.05.2013 11:52, Han JU wrote:
> >>> Hi Maja,
> >>>
> >>> The input graph of my problem is not big, the calculation result is
> very
> >>> big.
> >>> In fact what does out-of-core graph mean? Where can I find some
> examples
> >> of
> >>> this and for output during computation?
> >>>
> >>> Thanks.
> >>>
> >>>
> >>>
> >>> 2013/5/17 Maja Kabiljo <majakabi...@fb.com>
> >>>
> >>>>  Hi JU,
> >>>>
> >>>>  One thing you can try is to use out-of-core graph
> >>>> (giraph.useOutOfCoreGraph option).
> >>>>
> >>>>  I don't know what your exact use case is – do you have the graph
> which
> >>>> is huge or the data which you calculate in your application is? In the
> >>>> second case, there is 'giraph.doOutputDuringComputation' option you
> >> might
> >>>> want to try out. When that is turned on, during each superstep
> >> writeVertex
> >>>> will be called immediately after compute for that vertex is called.
> This
> >>>> means that you can store data you want to write in vertex, write it
> and
> >>>> clear the data before going to the next vertex.
> >>>>
> >>>>  Maja
> >>>>
> >>>>   From: Han JU <ju.han.fe...@gmail.com>
> >>>> Reply-To: "user@giraph.apache.org" <user@giraph.apache.org>
> >>>> Date: Friday, May 17, 2013 8:38 AM
> >>>> To: "user@giraph.apache.org" <user@giraph.apache.org>
> >>>> Subject: What if the resulting graph is larger than the memory?
> >>>>
> >>>>   Hi,
> >>>>
> >>>>  It's me again.
> >>>> After a day's work I've coded a Giraph solution for my problem at
> hand.
> >> I
> >>>> gave it a run on a medium dataset and it's notably faster than other
> >>>> approaches.
> >>>>
> >>>>  However the goal is to process larger inputs, for example I've a
> larger
> >>>> dataset that the result graph is about 400GB when represented in edge
> >>>> format and in text file. And I think the edges that the algorithm
> >> created
> >>>> all reside in the cluster's memory. So it means that for this big
> >> dataset,
> >>>> I need a cluster with ~ 400GB main memory to run? Is there any
> >>>> possibilities that I can output "on the go" that means I don't need to
> >>>> construct the whole graph, an edge is outputed to HDFS immediately
> >> instead
> >>>> of being created in main memory then be outputed?
> >>>>
> >>>>  Thanks!
> >>>> --
> >>>> *JU Han*
> >>>>
> >>>>    Software Engineer Intern @ KXEN Inc.
> >>>>   UTC   -  Université de Technologie de Compiègne
> >>>>    *     **GI06 - Fouille de Données et Décisionnel*
> >>>>
> >>>>  +33 0619608888
> >>>>
> >>>
> >>>
> >>>
> >>
> >>
> >
> >
>
>


-- 
*JU Han*

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
*     **GI06 - Fouille de Données et Décisionnel*

+33 0619608888

Re: What if the resulting graph is larger than the memory?

Reply via email to