Dear Avery, Regarding this decision about resource allocation, do you have a methodology or a rule of thumb that helps you decide which setting is expected to perform well? For example, with a given input (number of graph vertices), can you estimate what number of workers and how much memory per worker would be optimal? Or the other way around: given a pool of resources (cores & memory), what's a reasonable graph size?
That insight would be really interesting. Thanks, Alexandros On 11 December 2012 19:40, Avery Ching <ach...@apache.org> wrote: > We are running several Giraph applications in production using our version > of Hadoop (Corona) at Facebook. The part you have to be careful about is > ensuring you have enough resources for your job to run. But otherwise, we > are able to run at FB-scale (i.e. 1billion+ nodes, many more edges). > > Avery > > > On 12/11/12 5:58 AM, Gustavo Enrique Salazar Torres wrote: > >> Hi: >> >> I implemented a graph algorithm to recommend content to our users. >> Although it is working (implementation uses Mahout) it very inefficient >> because I have to run many iterations in order to perform a breadth-first >> search on my graph. >> I would like to use Giraph for that task. I would like to know if it is >> production ready. I'm running jobs on Amazon EMR. >> >> Thanks in advance. >> Gustavo >> > >