Could anyone tell me if I'm correct concerning the possible problem I posted and replied on in the previous two emails?
On Wed, Apr 17, 2013 at 5:08 PM, Steven van Beelen <[email protected]>wrote: > Additionally, I found this in the mail archives: > > http://mail-archives.apache.org/mod_mbox/hama-user/201210.mbox/%3CCAJ-=ys=W8F5W4aduV+=+yfsvh41xsa22-wnqqrkapadzd+q...@mail.gmail.com%3E > This actually exactly covers my point. Is this still considered as a bug, > calling two different aggregate functions in a row? > > > On Wed, Apr 17, 2013 at 2:35 PM, Steven van Beelen > <[email protected]>wrote: > >> Hi Thomas, >> >> Then I guess I did not explain myself clearly. >> What you describe is indeed how I think of the AverageAggregator to work, >> but if I use the AverageAggregator in my own PageRank implementation it >> does not return >> the average of all absolute differences but just the average of the sum >> of all values. >> >> The (very) small example graph I use has only five vertices, were the sum >> of every vertice it's value is always 1.0. >> When I use the AverageAggregator it will always return 0.2 when calling >> the getLastAggregatedValue method. >> It shouldn't do that right? >> >> >> On Wed, Apr 17, 2013 at 1:18 PM, Thomas Jungblut < >> [email protected]> wrote: >> >>> Hi Steven, >>> >>> the AverageAggregator is used to determine the average of all absolute >>> differences between old pagerank and new pagerank for every vertex. >>> This is documented like it should behave in the javadoc of the given >>> classes and suffices to track if pagerank values have yet converged or >>> not. >>> >>> What you describe is a perfectly valid way to track the pagerank >>> difference >>> throughout all supersteps. But this is not how (imho) the >>> AverageAggregator >>> should behave, so you have to write your own. >>> >>> >>> 2013/4/17 Steven van Beelen <[email protected]> >>> >>> > The values in my case are the DoubleWritable values each vertice has >>> and >>> > the aggregators aggregate on. >>> > My tests showed that, when the aggregator was set to >>> AverageAggregator, the >>> > average of all the vertice values from the past compute step were >>> returned. >>> > Actually, AverageAggregator should return the average difference of >>> all the >>> > old-new value pairs of every vertice instead of the mean. >>> > The average difference is then used to check whether convergence is >>> > reached, which is relevant for all task ofcourse. >>> > >>> > Hence, the convergence point, for which the Aggregator is used, will >>> not be >>> > reached. >>> > This thus makes it so that the algorithm will just run the maximum >>> number >>> > of iterations set (30 iterations on the PageRank example) in every >>> case. >>> > I experienced the same with my own PageRank implementation. >>> > >>> > I think it has something to do with the finalizeAggregation step taken. >>> > Next to that, both the 'aggregate(VERTEX vertex, M value)' and >>> > 'aggregate(VERTEX vertex, M oldValue, M newValue)' methods are called >>> every >>> > time, were one would think only the second (with old/new values) would >>> > suffice. >>> > Because of this, the global variable 'absoluteDifference' in the >>> > 'AbsDiffAggregator' class is overwriten/overruled by the first >>> aggregate. >>> > Additionally, if one would make its own Aggregation class in the same >>> > fashion as AbsDiffAggregator and AverageAggregator, but leave out the >>> > 'aggregate(VERTEX vertex, M value)', my output turned out to be 0.0000 >>> > every time. >>> > >>> > I hope I made myself clear. >>> > Regards >>> > >>> > >>> > On Wed, Apr 17, 2013 at 11:57 AM, Edward J. Yoon < >>> [email protected] >>> > >wrote: >>> > >>> > > Thanks for your report. >>> > > >>> > > What's the meaning of 'all the values'? Please give me more details >>> > > about your problem. >>> > > >>> > > I didn't look at 'dangling links & aggregators' part of PageRank >>> > > example closely, but I think there's no bug. Aggregators is just used >>> > > for global communication. For example, finding max value[1] can be >>> > > done in only one iteration using MaxValueAggregator. >>> > > >>> > > 1. >>> http://cdn.dejanseo.com.au/wp-content/uploads/2011/06/supersteps.png >>> > > >>> > > On Wed, Apr 17, 2013 at 6:27 PM, Steven van Beelen < >>> [email protected] >>> > > >>> > > wrote: >>> > > > Hello, >>> > > > >>> > > > I'm creating my own pagerank in hama for a testing and I think I >>> found >>> > a >>> > > > problem with the AverageAggregator. I'm not sure if it is me or >>> the the >>> > > > AverageAggregator class in general, but I believe it just returns >>> the >>> > > mean >>> > > > of all the values instead of the average difference between the >>> old and >>> > > new >>> > > > value as intended. >>> > > > >>> > > > For testing, I created my own AbsDiffAggregator and >>> AverageAggregator >>> > > > classes, using FloatWritable instead of DoubleWritables. The same >>> > problem >>> > > > still occured: I got a mean of all the values in the graph instead >>> of >>> > an >>> > > > average difference. >>> > > > >>> > > > Could someone tell me if I'm doing something wrong or what I should >>> > > provide >>> > > > to better explain my problem? >>> > > > >>> > > > Regards, >>> > > > Steven van Beelen, Vrije Universiteit of Amsterdam >>> > > >>> > > >>> > > >>> > > -- >>> > > Best Regards, Edward J. Yoon >>> > > @eddieyoon >>> > > >>> > >>> >> >> >
