Hi Puneet, It's unclear to me what you're wanting in terms of aggregator behavior. Are you saying you want an aggregator such that the final output is the aggregated value just for a particular worker? With an aggregator you should at least make sure the operations you're performing are commutative; that is, the order in which items are aggregated should not matter unless it is explicitly dealt with somehow. Otherwise you'll get unpredictable results.
Best, Matthew Saltz El 08/11/2014 15:05, "Puneet Agarwal" <puagar...@yahoo.com> escribió: > Hi All, > In my algo, I use an Aggregator which takes a Text value. I have written > my custom aggregator class for this, as given below. > > public class MyAgg extends BasicAggregator<Text> { > ... > } > > This works fine when running on my laptop with one worker. > However, when running it on the cluster, sometimes it does not return the > correctly aggregated value. > It seems it is returning the locally aggregated value of one of the > workers. > While it should have used my logic to decide which of the aggregated > values sent by various worker should be chosen as finally aggregated values. > (But in fact I have not written such a code anywhere, it is therefore > doing the best it could) > > Following is how is my analysis about this issue. > a. I guess every worker aggregates the values locally. > b. then there is a global aggregation step, which simply compares the > values sent by various aggregators. > c. For global aggregation it uses Text.compareTo() method. This method > Text.compareTo() is a default Hadoop implementation and does not include > the logic of my program. > d. It seem it is because of the above the value returned by my > aggregator in the cluster is actually not globally aggregated, but the > locally aggregated value of one of the worker gets taken. > > If the above analysis is correct, following is how I think I can solve > this. > I should write my own class that implements Writable interface. In this > class I would also write a compareTo method as a result things will start > working fine. > > If it was using class MyAgg itself, to decide which of the values returned > by various workers should be taken as globally aggregated value then this > problem would not have occurred. > > *I seek your guidance whether my analysis is correct.* > > - Puneet > IIT Delhi, India > >