Hi,
if I correctly understand your request you need only to merge some data
resulting from an hdfs write operation.
In this case, I suppose that your best option is to use hadoop-stream
with 'cat' command.
take a look here:
https://hadoop.apache.org/docs/r1.2.1/streaming.html
Regards
Il 03/11/2016 13:53, Piyush Mukati ha scritto:
Hi,
I want to merge multiple files in one HDFS dir to one file. I am
planning to write a map only job using input format which will create
only one inputSplit per dir.
this way my job don't need to do any shuffle/sort.(only read and write
back to disk)
Is there any such file format already implemented ?
Or any there better solution for the problem.
thanks.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org