First, the default reducer implementation is the identity so you could
reuse it directly.
Second, to make thing clearer, you could use NullWritable instead of
IntWritable.
Third, with regards to the output, you may need to write a custom output
format (and I don't see another way except using pig, cascading...
http://www.cascading.org/2012/07/02/cascading-for-the-impatient-part-1/).
Fourth, in java you have a boolean type, so you might want
your meetConditions function to return one instead of an integer.

Regards

Bertrand

On Tue, Sep 25, 2012 at 8:08 PM, Matthieu Labour <matth...@actionx.com>wrote:

> Hi
>
> I am completely new to Hadoop and I am trying to address the following
> simple application. I apologize if this sounds trivial.
>
> I have multiple log files I need to read the log files and collect the
> entries that meet some conditions and write them back to files for further
> processing. ( On other words, I need to filter out some events)
>
> I am using the WordCount example to get going.
>
> public static class Map extends
>             Mapper<LongWritable, Text, Text, IntWritable> {
>         private final static IntWritable one = new IntWritable(1);
>
>         public void map(LongWritable key, Text value, Context context)
>                 throws IOException, InterruptedException {
>             if(-1 != meetConditions(value)) {
>                 context.write(value, one);
>             }
>         }
>     }
>
> public static class Reduce extends
>             Reducer<Text, IntWritable, Text, IntWritable> {
>
>         public void reduce(Text key, Iterable<IntWritable> values,
>                 Context context) throws IOException, InterruptedException {
>             context.write(key, new IntWritable(1));
>         }
>     }
>
> The problem is that it prints the value 1 after each entry.
>
> Hence my question. What is the best trivial implementation of the map and
> reduce function to address the use case above ?
>
> Thank you greatly for your help
>



-- 
Bertrand Dechoux

Reply via email to