In general, my understanding of RC is that we should not add new features or improvements. I agree that we cannot fix all the open issues for bugs, but the least we can do is get the issues with a working patch in. In particular given that we're releasing a 1.0.
On Sun, Apr 14, 2013 at 6:18 PM, Avery Ching <ach...@apache.org> wrote: > Hi Sebastian, > > Thanks for the patch. I'll try to take a look at it. > > The only reason I bring the optimizations up is that a lot of folks tend > to compare PageRank performance. The optimizations I'm referring to are > Giraph ones, not algorithmic ones. We use ints, floats for ids, messages, > respectively instead longs, doubles (1/2 network traffic) and > IntNullArrayEdges vertex edges (efficient array backed edges) instead of > ByteArrayEdges. You can see https://issues.apache.org/** > jira/browse/giraph-543 <https://issues.apache.org/jira/browse/giraph-543>for > more details. > > Anyway, given that we are going to ship a 1.0.1 release in a few weeks for > a variety of reasons, should this really hold up the current release? I > would prefer to not cut anymore RCs unless things are totally broken (i.e. > profiles not compiling, major Giraph bugs, etc.). There are still a lot of > outstanding issues in JIRA, we can't fix them all for the 1.0 release. > > Let me know what you think. > > Avery > > > On 4/13/13 10:46 AM, Sebastian Schelter wrote: > >> Hi Avery, >> >> I found the bug and can I provide a patch today or tomorrow, so >> hopefully we can include that in the release (to not knowingly ship >> bugged code). Furthermore I improved the code to protect against >> rounding errors. >> >> I don't really get what you mean with the missing optimization in >> comparison to the benchmark PageRank implementation. >> >> The implementation in o.a.g.examples.PageRankVertex aims to be a robust >> real-world implementation. As optimization, it dismisses edge weights >> and reuses objects where possible. Furthermore it is able to handle >> dangling vertices that are present in almost every real-world network >> and it automatically detects the number of supersteps to run. With the >> patch, it should also provide improved numerical stability. >> >> If the runtimes doesn't look good enough when compared to the benchmark >> implementation, this might also be caused by the dataset which has a >> skewed degree distribution (like most real-world networks). The >> benchmark uses a uniform degree distribution AFAIK. >> >> Best, >> Sebastian >> >> On 13.04.2013 15:46, Avery Ching wrote: >> >>> That's great Sebastian. I would also recommend taking a look at the >>> PageRankBenchmark for a performance comparison. It has been a lot of >>> speed improvements that should be a bunch faster than PageRankVertex. >>> Even that though, is not totally optimized. Hopefully we'll be adding a >>> "how to optimize performance" guide in the near future. Should we delay >>> the release or simply just ship a 1.1, say in the next month with this >>> fix and supporting YARN's 2.0.4? I'd like to get on a more normal >>> release cycle rather than once a year =). >>> >>> Avery >>> >>> On 4/13/13 3:02 AM, Sebastian Schelter wrote: >>> >>>> Hi there, >>>> >>>> I got some good and bad news, I tested PageRankVertex (not the Benchmark >>>> but the example implementation o.a.g.examples.PageRankVertex) from trunk >>>> compiled for Hadoop 1.0 on a cluster of 26 machines with 208 cores. >>>> >>>> I used the Webbase2001 dataset [1] which has 115M vertices and more than >>>> 1B edges and got some awesome running times, average superstep takes 15 >>>> seconds (!!!). Awesome work, I have to say! >>>> >>>> Unfortunately, there seems to be an issue with the convergence >>>> detection, as it didn't get the correct convergence behavior. I'd like >>>> to have a look into that this week, so we can ship a performant PageRank >>>> implementation which automatically runs an appropriate number of >>>> supersteps. Hope this doesn't delay the release too much. >>>> >>>> Best, >>>> Sebastian >>>> >>>> >>>> [1] >>>> http://law.di.unimi.it/**webdata/webbase-2001/<http://law.di.unimi.it/webdata/webbase-2001/> >>>> >>>> >>>> On 13.04.2013 07:39, Avery Ching wrote: >>>> >>>>> Thanks to the quick feedback from Roman and Lewis, we have cut a new >>>>> RC1 >>>>> that addresses the following issues. >>>>> >>>>> * Got rid of .git repo in tarball >>>>> * Fixed issue with not compiling without git repo (GIRAPH-628) >>>>> * Used gnutar in OSX rather than tar to generate the tarball and get >>>>> rid >>>>> of warnings >>>>> * Pushed GIRAPH-627 to support the yarn profile better >>>>> * Tarball name changed to the final artifact name (giraph-1.0.tar.gz) >>>>> >>>>> Release notes: >>>>> http://people.apache.org/~**aching/giraph-1.0-RC1/RELEASE_**NOTES.html<http://people.apache.org/~aching/giraph-1.0-RC1/RELEASE_NOTES.html> >>>>> >>>>> Release artifacts: >>>>> http://people.apache.org/~**aching/giraph-1.0-RC1/<http://people.apache.org/~aching/giraph-1.0-RC1/> >>>>> >>>>> Corresponding git tag: >>>>> https://git-wip-us.apache.org/**repos/asf?p=giraph.git;a=** >>>>> shortlog;h=refs/tags/release-**1.0-RC1<https://git-wip-us.apache.org/repos/asf?p=giraph.git;a=shortlog;h=refs/tags/release-1.0-RC1> >>>>> >>>>> >>>>> >>>>> Signing keys: >>>>> http://people.apache.org/keys/**group/giraph.asc<http://people.apache.org/keys/group/giraph.asc> >>>>> >>>>> The vote runs for 72 hours, until Monday 11pm PST. >>>>> >>>>> Thanks, >>>>> >>>>> Avery >>>>> >>>>> Original message below regarding rc0: >>>>> >>>>> ------------------------------**- >>>>> >>>>> Fellow Giraphers, >>>>> >>>>> We have a our first release candidate since graduating from incubation. >>>>> This is a source release, primarily due to the different versions of >>>>> Hadoop we support with munge (similar to the 0.1 release). Since 0.1, >>>>> we've made A TON of progress on overall performance, optimizing memory >>>>> use, split vertex/edge inputs, easy interoperability with Apache Hive, >>>>> and a bunch of other areas. In many ways, this is an almost totally >>>>> different codebase. Thanks everyone for your hard work! >>>>> >>>>> Apache Giraph has been running in production at Facebook (against >>>>> Facebook's Corona implementation of Hadoop - >>>>> https://github.com/facebook/**hadoop-20/tree/master/src/** >>>>> contrib/corona<https://github.com/facebook/hadoop-20/tree/master/src/contrib/corona> >>>>> ) >>>>> since around last December. It has proven to be very scalable, >>>>> performant, and enables a bunch of new applications. Based on the >>>>> drastic improvements and the use of Giraph in production, it seems >>>>> appropriate to bump up our version to 1.0. >>>>> >>>>> While anyone can vote, the ASF requires majority approval from the PMC >>>>> -- i.e., at least three PMC members must vote affirmatively for >>>>> release, >>>>> and there must be more positive than negative votes. Releases may not >>>>> be >>>>> vetoed. Before voting +1 PMC members are required to download the >>>>> signed >>>>> source code package, compile it as provided, and test the resulting >>>>> executable on their own platform, along with also verifying that the >>>>> package meets the requirements of the ASF policy on releases. >>>>> >>>>> Please test this against many other Hadoop versions and let us know how >>>>> this goes! >>>>> >>>>> Release notes: >>>>> http://people.apache.org/~**aching/giraph-1.0-RC0/RELEASE_**NOTES.html<http://people.apache.org/~aching/giraph-1.0-RC0/RELEASE_NOTES.html> >>>>> >>>>> Release artifacts: >>>>> http://people.apache.org/~**aching/giraph-1.0-RC0/<http://people.apache.org/~aching/giraph-1.0-RC0/> >>>>> >>>>> Corresponding git tag: >>>>> https://git-wip-us.apache.org/**repos/asf?p=giraph.git;a=** >>>>> shortlog;h=refs/tags/release-**1.0-RC0<https://git-wip-us.apache.org/repos/asf?p=giraph.git;a=shortlog;h=refs/tags/release-1.0-RC0> >>>>> >>>>> >>>>> >>>>> Signing keys: >>>>> http://people.apache.org/keys/**group/giraph.asc<http://people.apache.org/keys/group/giraph.asc> >>>>> >>>>> The vote runs for 72 hours, until Monday 4pm PST. >>>>> >>>>> Thanks everyone for your patience with this release! >>>>> >>>>> Avery >>>>> >>>> > -- Claudio Martella claudio.marte...@gmail.com