Re: GroupingComparator

Dave Beech Mon, 15 Oct 2012 12:27:49 -0700

Hi Alberto

The iterator you are looping over in your reduce method isn't a
self-contained list of values. What's actually happening is that
you're iterating through *part* of the sorted key/value set that was
sent to that reduce node, and it is the grouping comparator that
decides when to break that loop and call reduce again on the next key.


Moreover, the "key" object is re-used. So, as you're iterating through
the values, what's actually happening is this pointer to the
associated key data moves with it - and you're seeing it change.

This only happens in the new "mapreduce" API - in the older "mapred"
API you get the first key, and it appears to stay the same during the
loop.

It's sometimes useful behaviour, but it's confusing how the two APIs
don't act the same.

Hope that helps,
Dave

On 15 October 2012 20:11, Alberto Cordioli <cordioli.albe...@gmail.com> wrote:
> Hi all,
>
> a very strange thing is happening with my hadoop program.
> My map simply emits tuples with a custom object as key (which
> implement WritableComparable).
> The object is made of 2 fields, and I implement my partitioner and
> groupingclass in such a way that only the first field is taken into
> account.
> The second field is just a tag and could be 1 or 2.
>
> This is the reducer's snippet:
>
> tag = key.getSecondField();
> Iterator it1 = values.iterator();
> while(it1.hasNext()){
>         it1.next();
>         collector.emit(new Text("dummy"), tag);
> }
>
> I would expect in my output all the lines with:
> dummy       1
> ...
> dummy       1
>
> but actually the value of tag changes in time and I obtain this type of 
> output:
>
> dummy    1
> ...
> dummy    1
> dummy    2
> ...
> dummy    2
>
>
> Someone could explain me way, please?
>
>
> Thanks.
>
>
>
>
>
> --
> Alberto Cordioli

Re: GroupingComparator

Reply via email to