Can't we use getmerge here ?  If you requirement is to merge some files in a 
particular directory to single file .. 

hadoop fs -getmerge <dir_of_input_files> <mergedsinglefile>

--Senthil
-----Original Message-----
From: Giovanni Mascari [mailto:giovanni.masc...@polito.it] 
Sent: Thursday, November 03, 2016 7:24 PM
To: Piyush Mukati <piyush.muk...@gmail.com>; user@hadoop.apache.org
Subject: Re: merging small files in HDFS

Hi,
if I correctly understand your request you need only to merge some data 
resulting from an hdfs write operation.
In this case, I suppose that your best option is to use hadoop-stream with 
'cat' command.

take a look here:
https://hadoop.apache.org/docs/r1.2.1/streaming.html

Regards

Il 03/11/2016 13:53, Piyush Mukati ha scritto:
> Hi,
> I want to merge multiple files in one HDFS dir to one file. I am 
> planning to write a map only job using input format which will create 
> only one inputSplit per dir.
> this way my job don't need to do any shuffle/sort.(only read and write 
> back to disk) Is there any such file format already implemented ?
> Or any there better solution for the problem.
>
> thanks.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Reply via email to