Hi,
if I correctly understand your request you need only to merge some data resulting from an hdfs write operation. In this case, I suppose that your best option is to use hadoop-stream with 'cat' command.

take a look here:
https://hadoop.apache.org/docs/r1.2.1/streaming.html

Regards

Il 03/11/2016 13:53, Piyush Mukati ha scritto:
Hi,
I want to merge multiple files in one HDFS dir to one file. I am planning to write a map only job using input format which will create only one inputSplit per dir. this way my job don't need to do any shuffle/sort.(only read and write back to disk)
Is there any such file format already implemented ?
Or any there better solution for the problem.

thanks.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Reply via email to