Hi,

I already skimmed through the logs but I could not find anything special.

I am just really confused why I am having this problem.

If the Iterable<...> for a specific key contains all of the observed values - and it seems to do so otherwise the program wouldn't work correctly in the standard case with [[ threshold = -1 ]] - it should also work when I only write the key-value pairs to the output file that suffice the condition [[ sum > threshold ]].

Did I miss something? Maybe I have to handle these cases in a specific way, but I did not find anything about that online.


Thank you for your help,

Peter



On 12.05.2015 12:35, Drake민영근 wrote:
Hi, Peter

The missing records, they are just gone without no logs? How about your reduce tasks logs?

Thanks

Drake 민영근 Ph.D
kt NexR

On Tue, May 12, 2015 at 5:18 AM, Peter Ruch <rutschifen...@gmail.com <mailto:rutschifen...@gmail.com>> wrote:

    Hello,

    sum and threshold are both Integers.
    for the threshold variable I first add a new resource to the
    configuration - conf.addResource( ... );

    later I get the threshold value from the configuration.

    Code
    #####################################

    private int threshold;

    public void setup( Context context ) {

              Configuration conf = context.getConfiguration();
              threshold = conf.getInt( "threshold", -1 );

    }

    #####################################


    Best,
    Peter



    On 11.05.2015 19:26, Shahab Yunus wrote:
    What is the type of the threshold variable? sum I believe is a
    Java int.

    Regards,
    Shahab

    On Mon, May 11, 2015 at 1:08 PM, Peter Ruch
    <rutschifen...@gmail.com <mailto:rutschifen...@gmail.com>> wrote:

        Hi,

        I am currently playing around with Hadoop and have some
        problems when trying to filter in the Reducer.

        I extended the WordCount v1.0 example from the 2.7 MapReduce
        Tutorial with some additional functionality
        and added the possibility to filter by the specific value of
        each key - e.g. only output the key-value pairs where [[
        value > threshold ]].

        Filtering Code in Reducer
        #####################################

        for (IntWritable val : values) {
             sum += val.get();
        }
        if ( sum > threshold ) {
             result.set(sum);
             context.write(key, result);
        }

        #####################################

        For threshold smaller any value the above code works as
        expected and the output contains all key-value pairs.
        If I increase the threshold to 1 some pairs are missing in
        the output although the respective value would be larger than
        the threshold.

        I tried to work out the error myself, but I could not get it
        to work as intended. I use the exact Tutorial setup with
        Oracle JDK 8
        on a CentOS 7 machine.

        As far as I understand the respective Iterable<...>  in the
        Reducer already contains all the observed values for a
        specific key.
        Why is it possible that I am missing some of these key-value
        pairs then? It only fails in very few cases. The input file
        is pretty large - 250 MB -
        so I also tried to increase the memory for the mapping and
        reduction steps but it did not help ( tried a lot of
        different stuff without success )

        Maybe someone already experienced similar problems / is more
        experienced than I am.


        Thank you,

        Peter





Reply via email to