What is the type of the threshold variable? sum I believe is a Java int. Regards, Shahab
On Mon, May 11, 2015 at 1:08 PM, Peter Ruch <rutschifen...@gmail.com> wrote: > Hi, > > I am currently playing around with Hadoop and have some problems when > trying to filter in the Reducer. > > I extended the WordCount v1.0 example from the 2.7 MapReduce Tutorial with > some additional functionality > and added the possibility to filter by the specific value of each key - > e.g. only output the key-value pairs where [[ value > threshold ]]. > > Filtering Code in Reducer > ##################################### > > for (IntWritable val : values) { > sum += val.get(); > } > if ( sum > threshold ) { > result.set(sum); > context.write(key, result); > } > > ##################################### > > For threshold smaller any value the above code works as expected and the > output contains all key-value pairs. > If I increase the threshold to 1 some pairs are missing in the output > although the respective value would be larger than the threshold. > > I tried to work out the error myself, but I could not get it to work as > intended. I use the exact Tutorial setup with Oracle JDK 8 > on a CentOS 7 machine. > > As far as I understand the respective Iterable<...> in the Reducer > already contains all the observed values for a specific key. > Why is it possible that I am missing some of these key-value pairs then? > It only fails in very few cases. The input file is pretty large - 250 MB - > so I also tried to increase the memory for the mapping and reduction steps > but it did not help ( tried a lot of different stuff without success ) > > Maybe someone already experienced similar problems / is more experienced > than I am. > > > Thank you, > > Peter >