Hi there,

I got some good and bad news, I tested PageRankVertex (not the Benchmark
but the example implementation o.a.g.examples.PageRankVertex) from trunk
compiled for Hadoop 1.0 on a cluster of 26 machines with 208 cores.

I used the Webbase2001 dataset [1] which has 115M vertices and more than
1B edges and got some awesome running times, average superstep takes 15
seconds (!!!). Awesome work, I have to say!

Unfortunately, there seems to be an issue with the convergence
detection, as it didn't get the correct convergence behavior. I'd like
to have a look into that this week, so we can ship a performant PageRank
implementation which automatically runs an appropriate number of
supersteps. Hope this doesn't delay the release too much.

Best,
Sebastian


[1] http://law.di.unimi.it/webdata/webbase-2001/


On 13.04.2013 07:39, Avery Ching wrote:
> Thanks to the quick feedback from Roman and Lewis, we have cut a new RC1
> that addresses the following issues.
> 
> * Got rid of .git repo in tarball
> * Fixed issue with not compiling without git repo (GIRAPH-628)
> * Used gnutar in OSX rather than tar to generate the tarball and get rid
> of warnings
> * Pushed GIRAPH-627 to support the yarn profile better
> * Tarball name changed to the final artifact name (giraph-1.0.tar.gz)
> 
> Release notes:
> http://people.apache.org/~aching/giraph-1.0-RC1/RELEASE_NOTES.html
> 
> Release artifacts:
> http://people.apache.org/~aching/giraph-1.0-RC1/
> 
> Corresponding git tag:
> https://git-wip-us.apache.org/repos/asf?p=giraph.git;a=shortlog;h=refs/tags/release-1.0-RC1
> 
> 
> Signing keys:
> http://people.apache.org/keys/group/giraph.asc
> 
> The vote runs for 72 hours, until Monday 11pm PST.
> 
> Thanks,
> 
> Avery
> 
> Original message below regarding rc0:
> 
> -------------------------------
> 
> Fellow Giraphers,
> 
> We have a our first release candidate since graduating from incubation.
>  This is a source release, primarily due to the different versions of
> Hadoop we support with munge (similar to the 0.1 release).  Since 0.1,
> we've made A TON of progress on overall performance, optimizing memory
> use, split vertex/edge inputs, easy interoperability with Apache Hive,
> and a bunch of other areas.  In many ways, this is an almost totally
> different codebase.  Thanks everyone for your hard work!
> 
> Apache Giraph has been running in production at Facebook (against
> Facebook's Corona implementation of Hadoop -
> https://github.com/facebook/hadoop-20/tree/master/src/contrib/corona)
> since around last December.  It has proven to be very scalable,
> performant, and enables a bunch of new applications.  Based on the
> drastic improvements and the use of Giraph in production, it seems
> appropriate to bump up our version to 1.0.
> 
> While anyone can vote, the ASF requires majority approval from the PMC
> -- i.e., at least three PMC members must vote affirmatively for release,
> and there must be more positive than negative votes. Releases may not be
> vetoed. Before voting +1 PMC members are required to download the signed
> source code package, compile it as provided, and test the resulting
> executable on their own platform, along with also verifying that the
> package meets the requirements of the ASF policy on releases.
> 
> Please test this against many other Hadoop versions and let us know how
> this goes!
> 
> Release notes:
> http://people.apache.org/~aching/giraph-1.0-RC0/RELEASE_NOTES.html
> 
> Release artifacts:
> http://people.apache.org/~aching/giraph-1.0-RC0/
> 
> Corresponding git tag:
> https://git-wip-us.apache.org/repos/asf?p=giraph.git;a=shortlog;h=refs/tags/release-1.0-RC0
> 
> 
> Signing keys:
> http://people.apache.org/keys/group/giraph.asc
> 
> The vote runs for 72 hours, until Monday 4pm PST.
> 
> Thanks everyone for your patience with this release!
> 
> Avery

Reply via email to