Hi,
I am using multiple outputs in our job. So whenever any reduce task fails,
all it's next task attempts are failing with file exist exception.
The output file name should also append the task attempt right? But it's
only appending the task id. Is this the bug or Some thing wrong from my
Are you using the MultipleOutputs class shipped with Apache Hadoop or
one of your own?
If its the latter, please take a look at gotchas to take care of
described at
http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2Fwrite-to_hdfs_files_directly_from_map.2Freduce_tasks.3F
On Mon, Dec 30,
Thanks Harsh.
@Are you using the MultipleOutputs class shipped with Apache Hadoop or
one of your own?
I am using Apache Hadoop's multipleOutputs.
But as you see in stack trace, it's not appending the attempt id to file
name, it's only consists of task id.
Thanks Regards,
B Anil Kumar.
On
I think if the task fails, the output related to that task will be clean up
before the second attempt. I am guessing you have this exception is because
you have two reducers tried to write to the same file. One thing you need
to be aware of is that all data that is supposed to be in the same file