Hi there, I got some good and bad news, I tested PageRankVertex (not the Benchmark but the example implementation o.a.g.examples.PageRankVertex) from trunk compiled for Hadoop 1.0 on a cluster of 26 machines with 208 cores.
I used the Webbase2001 dataset [1] which has 115M vertices and more than 1B edges and got some awesome running times, average superstep takes 15 seconds (!!!). Awesome work, I have to say! Unfortunately, there seems to be an issue with the convergence detection, as it didn't get the correct convergence behavior. I'd like to have a look into that this week, so we can ship a performant PageRank implementation which automatically runs an appropriate number of supersteps. Hope this doesn't delay the release too much. Best, Sebastian [1] http://law.di.unimi.it/webdata/webbase-2001/ On 13.04.2013 07:39, Avery Ching wrote: > Thanks to the quick feedback from Roman and Lewis, we have cut a new RC1 > that addresses the following issues. > > * Got rid of .git repo in tarball > * Fixed issue with not compiling without git repo (GIRAPH-628) > * Used gnutar in OSX rather than tar to generate the tarball and get rid > of warnings > * Pushed GIRAPH-627 to support the yarn profile better > * Tarball name changed to the final artifact name (giraph-1.0.tar.gz) > > Release notes: > http://people.apache.org/~aching/giraph-1.0-RC1/RELEASE_NOTES.html > > Release artifacts: > http://people.apache.org/~aching/giraph-1.0-RC1/ > > Corresponding git tag: > https://git-wip-us.apache.org/repos/asf?p=giraph.git;a=shortlog;h=refs/tags/release-1.0-RC1 > > > Signing keys: > http://people.apache.org/keys/group/giraph.asc > > The vote runs for 72 hours, until Monday 11pm PST. > > Thanks, > > Avery > > Original message below regarding rc0: > > ------------------------------- > > Fellow Giraphers, > > We have a our first release candidate since graduating from incubation. > This is a source release, primarily due to the different versions of > Hadoop we support with munge (similar to the 0.1 release). Since 0.1, > we've made A TON of progress on overall performance, optimizing memory > use, split vertex/edge inputs, easy interoperability with Apache Hive, > and a bunch of other areas. In many ways, this is an almost totally > different codebase. Thanks everyone for your hard work! > > Apache Giraph has been running in production at Facebook (against > Facebook's Corona implementation of Hadoop - > https://github.com/facebook/hadoop-20/tree/master/src/contrib/corona) > since around last December. It has proven to be very scalable, > performant, and enables a bunch of new applications. Based on the > drastic improvements and the use of Giraph in production, it seems > appropriate to bump up our version to 1.0. > > While anyone can vote, the ASF requires majority approval from the PMC > -- i.e., at least three PMC members must vote affirmatively for release, > and there must be more positive than negative votes. Releases may not be > vetoed. Before voting +1 PMC members are required to download the signed > source code package, compile it as provided, and test the resulting > executable on their own platform, along with also verifying that the > package meets the requirements of the ASF policy on releases. > > Please test this against many other Hadoop versions and let us know how > this goes! > > Release notes: > http://people.apache.org/~aching/giraph-1.0-RC0/RELEASE_NOTES.html > > Release artifacts: > http://people.apache.org/~aching/giraph-1.0-RC0/ > > Corresponding git tag: > https://git-wip-us.apache.org/repos/asf?p=giraph.git;a=shortlog;h=refs/tags/release-1.0-RC0 > > > Signing keys: > http://people.apache.org/keys/group/giraph.asc > > The vote runs for 72 hours, until Monday 4pm PST. > > Thanks everyone for your patience with this release! > > Avery