with speculative execution enabled Hadoop can run task attempt on more
then 1 node. If mapper is using multipleoutputs then second attempt (or
sometimes even all) fails to create output file because it is being
created by another attempt:
attempt_1347286420691_0011_m_000000_0
attempt_1347286420691_0011_m_000000_1
..
fails with
Error: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:
failed to create file /cznewgen/segments/20120907190053/parse_db/-m-00000
in my code i am using mos.write with 4 arguments. this problem is
discussed in javadoc for FileOutputFormat function getWorkOutputPath,
its possible to change MultipleOutputs to take advantage of this function?
or its better to change FileOoutputFormat.getUniqueFile() to append last
digit in attempt id to filename to create unique names such as
/cznewgen/segments/20120907190053/parse_db/-m-00000_0 ?