Hi,
We are looking into some giraph benchmarks to compare against a similar
programming model and framework we are working on.

As a start we are planning to benchmark the following algorithms on data
sets with more than a billion edges.

1. Single Source Shortest Path from a given source
2. Page Rank
3. Connected Components

We have a small cluster of 16 nodes (8 core/16 gb each) to run the
benchmarks. Given that we have a few questions to help us get the best out
of giraph.

1. Which version of giraph should we use to take advantage of the
optimizations in terms of memory optimization/caching, multi-threading etc.
mentioned here
https://www.facebook.com/notes/facebook-engineering/scaling-apache-giraph-to-a-trillion-edges/10151617006153920?
1.0 or trunk?

2. Are the samples present in the giraph distribution for the above
algorithms a good place to start? How can we take advantage of different
optimizations, including aggregators/combiners for these algorithms?

3. Is there a document i can look at to understand the best practices for
implementing optimized vertex-centric code using the latest features and
deployment guidelines to maximize utilization.

Looking forward to your help.

Thanks,
Alok Kumbhare

Reply via email to