Ups, sorry. You are using standart implementations? I dont know whats happening then. Sorry. But the fact, that your inputsize equals your outputsize in a "join" process reminded me too much of my own problems. Sorry for confusion, i may have caused.
Best, Am 25.09.2012 um 15:32 schrieb Björn-Elmar Macek <ma...@cs.uni-kassel.de>: > Hi, > > i had this problem once too. Did you properly overwrite the reduce method > with the @override annotation? > Does your reduce method use OutputCollector or Context for gathering outputs? > If you are using current version, it has to be Context. > > The thing is: if you do NOT override the standart reduce function (identity) > is used and this results ofc in the same number of tuples as you read as > input. > > Good luck! > Elmar > > Am 25.09.2012 um 11:57 schrieb Sigurd Spieckermann > <sigurd.spieckerm...@gmail.com>: > >> I think I have tracked down the problem to the point that each split only >> contains one big key-value pair and a combiner is connected to a map task. >> Please correct me if I'm wrong, but I assume each map task takes one split >> and the combiner operates only on the key-value pairs within one split. >> That's why the combiner has no effect in my case. Is there a way to combine >> the mapper outputs of multiple splits before they are sent off to the >> reducer? >> >> 2012/9/25 Sigurd Spieckermann <sigurd.spieckerm...@gmail.com> >> Maybe one more note: the combiner and the reducer class are the same and in >> the reduce-phase the values get aggregated correctly. Why is this not >> happening in the combiner-phase? >> >> >> 2012/9/25 Sigurd Spieckermann <sigurd.spieckerm...@gmail.com> >> Hi guys, >> >> I'm experiencing a strange behavior when I use the Hadoop join-package. >> After running a job the result statistics show that my combiner has an input >> of 100 records and an output of 100 records. From the task I'm running and >> the way it's implemented, I know that each key appears multiple times and >> the values should be combinable before getting passed to the reducer. I'm >> running my tests in pseudo-distributed mode with one or two map tasks. From >> using the debugger, I noticed that each key-value pair is processed by a >> combiner individually so there's actually no list passed into the combiner >> that it could aggregate. Can anyone think of a reason that causes this >> undesired behavior? >> >> Thanks >> Sigurd >> >> >