That's great Sebastian. I would also recommend taking a look at the
PageRankBenchmark for a performance comparison. It has been a lot of
speed improvements that should be a bunch faster than PageRankVertex.
Even that though, is not totally optimized. Hopefully we'll be adding a
"how to optimize performance" guide in the near future. Should we delay
the release or simply just ship a 1.1, say in the next month with this
fix and supporting YARN's 2.0.4? I'd like to get on a more normal
release cycle rather than once a year =).
Avery
On 4/13/13 3:02 AM, Sebastian Schelter wrote:
Hi there,
I got some good and bad news, I tested PageRankVertex (not the Benchmark
but the example implementation o.a.g.examples.PageRankVertex) from trunk
compiled for Hadoop 1.0 on a cluster of 26 machines with 208 cores.
I used the Webbase2001 dataset [1] which has 115M vertices and more than
1B edges and got some awesome running times, average superstep takes 15
seconds (!!!). Awesome work, I have to say!
Unfortunately, there seems to be an issue with the convergence
detection, as it didn't get the correct convergence behavior. I'd like
to have a look into that this week, so we can ship a performant PageRank
implementation which automatically runs an appropriate number of
supersteps. Hope this doesn't delay the release too much.
Best,
Sebastian
[1] http://law.di.unimi.it/webdata/webbase-2001/
On 13.04.2013 07:39, Avery Ching wrote:
Thanks to the quick feedback from Roman and Lewis, we have cut a new RC1
that addresses the following issues.
* Got rid of .git repo in tarball
* Fixed issue with not compiling without git repo (GIRAPH-628)
* Used gnutar in OSX rather than tar to generate the tarball and get rid
of warnings
* Pushed GIRAPH-627 to support the yarn profile better
* Tarball name changed to the final artifact name (giraph-1.0.tar.gz)
Release notes:
http://people.apache.org/~aching/giraph-1.0-RC1/RELEASE_NOTES.html
Release artifacts:
http://people.apache.org/~aching/giraph-1.0-RC1/
Corresponding git tag:
https://git-wip-us.apache.org/repos/asf?p=giraph.git;a=shortlog;h=refs/tags/release-1.0-RC1
Signing keys:
http://people.apache.org/keys/group/giraph.asc
The vote runs for 72 hours, until Monday 11pm PST.
Thanks,
Avery
Original message below regarding rc0:
-------------------------------
Fellow Giraphers,
We have a our first release candidate since graduating from incubation.
This is a source release, primarily due to the different versions of
Hadoop we support with munge (similar to the 0.1 release). Since 0.1,
we've made A TON of progress on overall performance, optimizing memory
use, split vertex/edge inputs, easy interoperability with Apache Hive,
and a bunch of other areas. In many ways, this is an almost totally
different codebase. Thanks everyone for your hard work!
Apache Giraph has been running in production at Facebook (against
Facebook's Corona implementation of Hadoop -
https://github.com/facebook/hadoop-20/tree/master/src/contrib/corona)
since around last December. It has proven to be very scalable,
performant, and enables a bunch of new applications. Based on the
drastic improvements and the use of Giraph in production, it seems
appropriate to bump up our version to 1.0.
While anyone can vote, the ASF requires majority approval from the PMC
-- i.e., at least three PMC members must vote affirmatively for release,
and there must be more positive than negative votes. Releases may not be
vetoed. Before voting +1 PMC members are required to download the signed
source code package, compile it as provided, and test the resulting
executable on their own platform, along with also verifying that the
package meets the requirements of the ASF policy on releases.
Please test this against many other Hadoop versions and let us know how
this goes!
Release notes:
http://people.apache.org/~aching/giraph-1.0-RC0/RELEASE_NOTES.html
Release artifacts:
http://people.apache.org/~aching/giraph-1.0-RC0/
Corresponding git tag:
https://git-wip-us.apache.org/repos/asf?p=giraph.git;a=shortlog;h=refs/tags/release-1.0-RC0
Signing keys:
http://people.apache.org/keys/group/giraph.asc
The vote runs for 72 hours, until Monday 4pm PST.
Thanks everyone for your patience with this release!
Avery