I have to add that we have 1-2 Billion of Events per day, split to some
thousands of files. So pre-reading each file in the InputFormat should be
avoided.
And yes, we could use MultipleOutputs and write bad files to process each input
file. But we (our Operations team) think that there is more
Hi all,
In my mapreduce job, I would like to process only whole input files containing
only valid rows. If one map task processing an input split of a file detects an
invalid row, the whole file should be marked as invalid and not processed at
all. This input file will then be cleansed by
Hi,
I started my first experimental Hadoop project with Hadoop 0.20.1 an run
in the following problem:
Job job = new Job(new Configuration(),Myjob);
job.setInputFormatClass(KeyValueTextInputFormat.class);
The last line throws the following error: The method
setInputFormatClass(Class? extends
Zhang
On Thu, Nov 26, 2009 at 7:10 AM, Matthias Scherer
matthias.sche...@1und1.de
wrote:
Hi,
I started my first experimental Hadoop project with Hadoop
0.20.1 an
run in the following problem:
Job job = new Job(new Configuration(),Myjob);
job.setInputFormatClass
, 21. Januar 2009 13:59
An: core-user@hadoop.apache.org
Betreff: Re: Why does Hadoop need ssh access to master and slaves?
Amit k. Saha wrote:
On Wed, Jan 21, 2009 at 5:53 PM, Matthias Scherer
matthias.sche...@1und1.de wrote:
Hi all,
we've made our first steps in evaluating hadoop