with speculative execution enabled Hadoop can run task attempt on more then 1 node. If mapper is using multipleoutputs then second attempt (or sometimes even all) fails to create output file because it is being created by another attempt:

attempt_1347286420691_0011_m_000000_0
attempt_1347286420691_0011_m_000000_1
..
fails with
Error: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /cznewgen/segments/20120907190053/parse_db/-m-00000

in my code i am using mos.write with 4 arguments. this problem is discussed in javadoc for FileOutputFormat function getWorkOutputPath, its possible to change MultipleOutputs to take advantage of this function?

or its better to change FileOoutputFormat.getUniqueFile() to append last digit in attempt id to filename to create unique names such as /cznewgen/segments/20120907190053/parse_db/-m-00000_0 ?

Reply via email to