Hi, I want to merge multiple files in one HDFS dir to one file. I am planning to write a map only job using input format which will create only one inputSplit per dir. this way my job don't need to do any shuffle/sort.(only read and write back to disk) Is there any such file format already implemented ? Or any there better solution for the problem.
thanks.