Job fails while re attempting the task in multiple outputs case

2013-12-30 Thread AnilKumar B
Hi, I am using multiple outputs in our job. So whenever any reduce task fails, all it's next task attempts are failing with file exist exception. The output file name should also append the task attempt right? But it's only appending the task id. Is this the bug or Some thing wrong from my

Re: Job fails while re attempting the task in multiple outputs case

2013-12-30 Thread Harsh J
Are you using the MultipleOutputs class shipped with Apache Hadoop or one of your own? If its the latter, please take a look at gotchas to take care of described at http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2Fwrite-to_hdfs_files_directly_from_map.2Freduce_tasks.3F On Mon, Dec 30,

Re: Job fails while re attempting the task in multiple outputs case

2013-12-30 Thread AnilKumar B
Thanks Harsh. @Are you using the MultipleOutputs class shipped with Apache Hadoop or one of your own? I am using Apache Hadoop's multipleOutputs. But as you see in stack trace, it's not appending the attempt id to file name, it's only consists of task id. Thanks Regards, B Anil Kumar. On

Re: Job fails while re attempting the task in multiple outputs case

2013-12-30 Thread Jiayu Ji
I think if the task fails, the output related to that task will be clean up before the second attempt. I am guessing you have this exception is because you have two reducers tried to write to the same file. One thing you need to be aware of is that all data that is supposed to be in the same file