Hi Puneet,

It's unclear to me what you're wanting in terms of aggregator behavior. Are
you saying you want an aggregator such that the final output is the
aggregated value just for a particular worker? With an aggregator you
should at least make sure the operations you're performing are commutative;
that is, the order in which items are aggregated should not matter unless
it is explicitly dealt with somehow. Otherwise you'll get unpredictable
results.

Best,
Matthew Saltz
El 08/11/2014 15:05, "Puneet Agarwal" <puagar...@yahoo.com> escribió:

> Hi All,
> In my algo, I use an Aggregator which takes a Text value. I have written
> my custom aggregator class for this, as given below.
>
> public class MyAgg extends BasicAggregator<Text> {
> ...
> }
>
> This works fine when running on my laptop with one worker.
> However, when running it on the cluster, sometimes it does not return the
> correctly aggregated value.
> It seems it is returning the locally aggregated value of one of the
> workers.
> While it should have used my logic to decide which of the aggregated
> values sent by various worker should be chosen as finally aggregated values.
> (But in fact I have not written such a code anywhere, it is therefore
> doing the best it could)
>
> Following is how is my analysis about this issue.
> a.    I guess every worker aggregates the values locally.
> b.    then there is a global aggregation step, which simply compares the
> values sent by various aggregators.
> c.    For global aggregation it uses Text.compareTo() method. This method
> Text.compareTo() is a default Hadoop implementation and does not include
> the logic of my program.
> d.    It seem it is because of the above the value returned by my
> aggregator in the cluster is actually not globally aggregated, but the
> locally aggregated value of one of the worker gets taken.
>
> If the above analysis is correct, following is how I think I can solve
> this.
> I should write my own class that implements Writable interface. In this
> class I would also write a compareTo method as a result things will start
> working fine.
>
> If it was using class MyAgg itself, to decide which of the values returned
> by various workers should be taken as globally aggregated value then this
> problem would not have occurred.
>
> *I seek your guidance whether my analysis is correct.*
>
> - Puneet
> IIT Delhi, India
>
>

Reply via email to