Additionally, I found this in the mail archives: http://mail-archives.apache.org/mod_mbox/hama-user/201210.mbox/%3CCAJ-=ys=W8F5W4aduV+=+yfsvh41xsa22-wnqqrkapadzd+q...@mail.gmail.com%3E This actually exactly covers my point. Is this still considered as a bug, calling two different aggregate functions in a row?
On Wed, Apr 17, 2013 at 2:35 PM, Steven van Beelen <[email protected]>wrote: > Hi Thomas, > > Then I guess I did not explain myself clearly. > What you describe is indeed how I think of the AverageAggregator to work, > but if I use the AverageAggregator in my own PageRank implementation it > does not return > the average of all absolute differences but just the average of the sum of > all values. > > The (very) small example graph I use has only five vertices, were the sum > of every vertice it's value is always 1.0. > When I use the AverageAggregator it will always return 0.2 when calling > the getLastAggregatedValue method. > It shouldn't do that right? > > > On Wed, Apr 17, 2013 at 1:18 PM, Thomas Jungblut < > [email protected]> wrote: > >> Hi Steven, >> >> the AverageAggregator is used to determine the average of all absolute >> differences between old pagerank and new pagerank for every vertex. >> This is documented like it should behave in the javadoc of the given >> classes and suffices to track if pagerank values have yet converged or >> not. >> >> What you describe is a perfectly valid way to track the pagerank >> difference >> throughout all supersteps. But this is not how (imho) the >> AverageAggregator >> should behave, so you have to write your own. >> >> >> 2013/4/17 Steven van Beelen <[email protected]> >> >> > The values in my case are the DoubleWritable values each vertice has and >> > the aggregators aggregate on. >> > My tests showed that, when the aggregator was set to AverageAggregator, >> the >> > average of all the vertice values from the past compute step were >> returned. >> > Actually, AverageAggregator should return the average difference of all >> the >> > old-new value pairs of every vertice instead of the mean. >> > The average difference is then used to check whether convergence is >> > reached, which is relevant for all task ofcourse. >> > >> > Hence, the convergence point, for which the Aggregator is used, will >> not be >> > reached. >> > This thus makes it so that the algorithm will just run the maximum >> number >> > of iterations set (30 iterations on the PageRank example) in every case. >> > I experienced the same with my own PageRank implementation. >> > >> > I think it has something to do with the finalizeAggregation step taken. >> > Next to that, both the 'aggregate(VERTEX vertex, M value)' and >> > 'aggregate(VERTEX vertex, M oldValue, M newValue)' methods are called >> every >> > time, were one would think only the second (with old/new values) would >> > suffice. >> > Because of this, the global variable 'absoluteDifference' in the >> > 'AbsDiffAggregator' class is overwriten/overruled by the first >> aggregate. >> > Additionally, if one would make its own Aggregation class in the same >> > fashion as AbsDiffAggregator and AverageAggregator, but leave out the >> > 'aggregate(VERTEX vertex, M value)', my output turned out to be 0.0000 >> > every time. >> > >> > I hope I made myself clear. >> > Regards >> > >> > >> > On Wed, Apr 17, 2013 at 11:57 AM, Edward J. Yoon <[email protected] >> > >wrote: >> > >> > > Thanks for your report. >> > > >> > > What's the meaning of 'all the values'? Please give me more details >> > > about your problem. >> > > >> > > I didn't look at 'dangling links & aggregators' part of PageRank >> > > example closely, but I think there's no bug. Aggregators is just used >> > > for global communication. For example, finding max value[1] can be >> > > done in only one iteration using MaxValueAggregator. >> > > >> > > 1. >> http://cdn.dejanseo.com.au/wp-content/uploads/2011/06/supersteps.png >> > > >> > > On Wed, Apr 17, 2013 at 6:27 PM, Steven van Beelen < >> [email protected] >> > > >> > > wrote: >> > > > Hello, >> > > > >> > > > I'm creating my own pagerank in hama for a testing and I think I >> found >> > a >> > > > problem with the AverageAggregator. I'm not sure if it is me or the >> the >> > > > AverageAggregator class in general, but I believe it just returns >> the >> > > mean >> > > > of all the values instead of the average difference between the old >> and >> > > new >> > > > value as intended. >> > > > >> > > > For testing, I created my own AbsDiffAggregator and >> AverageAggregator >> > > > classes, using FloatWritable instead of DoubleWritables. The same >> > problem >> > > > still occured: I got a mean of all the values in the graph instead >> of >> > an >> > > > average difference. >> > > > >> > > > Could someone tell me if I'm doing something wrong or what I should >> > > provide >> > > > to better explain my problem? >> > > > >> > > > Regards, >> > > > Steven van Beelen, Vrije Universiteit of Amsterdam >> > > >> > > >> > > >> > > -- >> > > Best Regards, Edward J. Yoon >> > > @eddieyoon >> > > >> > >> > >
