AW: How to process only input files containing 100% valid rows

2013-04-19 Thread Matthias Scherer
I have to add that we have 1-2 Billion of Events per day, split to some thousands of files. So pre-reading each file in the InputFormat should be avoided. And yes, we could use MultipleOutputs and write bad files to process each input file. But we (our Operations team) think that there is more

How to process only input files containing 100% valid rows

2013-04-18 Thread Matthias Scherer
Hi all, In my mapreduce job, I would like to process only whole input files containing only valid rows. If one map task processing an input split of a file detects an invalid row, the whole file should be marked as invalid and not processed at all. This input file will then be cleansed by

KeyValueTextInputFormat and Hadoop 0.20.1

2009-11-26 Thread Matthias Scherer
Hi, I started my first experimental Hadoop project with Hadoop 0.20.1 an run in the following problem: Job job = new Job(new Configuration(),Myjob); job.setInputFormatClass(KeyValueTextInputFormat.class); The last line throws the following error: The method setInputFormatClass(Class? extends

AW: KeyValueTextInputFormat and Hadoop 0.20.1

2009-11-26 Thread Matthias Scherer
Zhang On Thu, Nov 26, 2009 at 7:10 AM, Matthias Scherer matthias.sche...@1und1.de wrote: Hi, I started my first experimental Hadoop project with Hadoop 0.20.1 an run in the following problem: Job job = new Job(new Configuration(),Myjob); job.setInputFormatClass

AW: Why does Hadoop need ssh access to master and slaves?

2009-01-21 Thread Matthias Scherer
, 21. Januar 2009 13:59 An: core-user@hadoop.apache.org Betreff: Re: Why does Hadoop need ssh access to master and slaves? Amit k. Saha wrote: On Wed, Jan 21, 2009 at 5:53 PM, Matthias Scherer matthias.sche...@1und1.de wrote: Hi all, we've made our first steps in evaluating hadoop