Re: merging small files in HDFS

Piyush Mukati Thu, 03 Nov 2016 22:35:35 -0700

Hi,
thanks for the suggestion.
"hadoop fs -getmerge"  is a good and simple solution for one time activity
on few directory.
 But It may have problems at scale as this solution copy the data to local
from hdfs and then put it back to hdfs.
 Also here we have to take care of compressing and decompressing separately
.
we need to run this merge every hour for thousands of directories.




On Thu, Nov 3, 2016 at 7:28 PM, kumar, Senthil(AWF) <senthiku...@ebay.com>
wrote:

> Can't we use getmerge here ?  If you requirement is to merge some files in
> a particular directory to single file ..
>
> hadoop fs -getmerge <dir_of_input_files> <mergedsinglefile>
>
> --Senthil
> -----Original Message-----
> From: Giovanni Mascari [mailto:giovanni.masc...@polito.it]
> Sent: Thursday, November 03, 2016 7:24 PM
> To: Piyush Mukati <piyush.muk...@gmail.com>; user@hadoop.apache.org
> Subject: Re: merging small files in HDFS
>
> Hi,
> if I correctly understand your request you need only to merge some data
> resulting from an hdfs write operation.
> In this case, I suppose that your best option is to use hadoop-stream with
> 'cat' command.
>
> take a look here:
> https://hadoop.apache.org/docs/r1.2.1/streaming.html
>
> Regards
>
> Il 03/11/2016 13:53, Piyush Mukati ha scritto:
> > Hi,
> > I want to merge multiple files in one HDFS dir to one file. I am
> > planning to write a map only job using input format which will create
> > only one inputSplit per dir.
> > this way my job don't need to do any shuffle/sort.(only read and write
> > back to disk) Is there any such file format already implemented ?
> > Or any there better solution for the problem.
> >
> > thanks.
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
>
>

Re: merging small files in HDFS

Reply via email to