Re: Join-package combiner number of input and output records the same

Björn-Elmar Macek Tue, 25 Sep 2012 06:41:17 -0700

Ups, sorry. You are using standart implementations? I dont know whats happening 
then. Sorry. But the fact, that your inputsize equals your outputsize in a 
"join" process reminded me too much of my own problems. Sorry for confusion, i 
may have caused.


Best,
Am 25.09.2012 um 15:32 schrieb Björn-Elmar Macek <ma...@cs.uni-kassel.de>:

> Hi,
> 
> i had this problem once too. Did you properly overwrite the reduce method 
> with the @override annotation?
> Does your reduce method use OutputCollector or Context for gathering outputs? 
> If you are using current version, it has to be Context.
> 
> The thing is: if you do NOT override the standart reduce function (identity) 
> is used and this results ofc in the same number of tuples as you read as 
> input.
> 
> Good luck!
> Elmar
> 
> Am 25.09.2012 um 11:57 schrieb Sigurd Spieckermann 
> <sigurd.spieckerm...@gmail.com>:
> 
>> I think I have tracked down the problem to the point that each split only 
>> contains one big key-value pair and a combiner is connected to a map task. 
>> Please correct me if I'm wrong, but I assume each map task takes one split 
>> and the combiner operates only on the key-value pairs within one split. 
>> That's why the combiner has no effect in my case. Is there a way to combine 
>> the mapper outputs of multiple splits before they are sent off to the 
>> reducer?
>> 
>> 2012/9/25 Sigurd Spieckermann <sigurd.spieckerm...@gmail.com>
>> Maybe one more note: the combiner and the reducer class are the same and in 
>> the reduce-phase the values get aggregated correctly. Why is this not 
>> happening in the combiner-phase?
>> 
>> 
>> 2012/9/25 Sigurd Spieckermann <sigurd.spieckerm...@gmail.com>
>> Hi guys,
>> 
>> I'm experiencing a strange behavior when I use the Hadoop join-package. 
>> After running a job the result statistics show that my combiner has an input 
>> of 100 records and an output of 100 records. From the task I'm running and 
>> the way it's implemented, I know that each key appears multiple times and 
>> the values should be combinable before getting passed to the reducer. I'm 
>> running my tests in pseudo-distributed mode with one or two map tasks. From 
>> using the debugger, I noticed that each key-value pair is processed by a 
>> combiner individually so there's actually no list passed into the combiner 
>> that it could aggregate. Can anyone think of a reason that causes this 
>> undesired behavior?
>> 
>> Thanks
>> Sigurd
>> 
>> 
>

Re: Join-package combiner number of input and output records the same

Reply via email to