Re: merging small files in HDFS

2016-11-03 Thread Piyush Mukati
Hi, thanks for the suggestion. "hadoop fs -getmerge" is a good and simple solution for one time activity on few directory. But It may have problems at scale as this solution copy the data to local from hdfs and then put it back to hdfs. Also here we have to take care of compressing and

Re: merging small files in HDFS

2016-11-03 Thread dileep kumar
Hi , You need to write a map method to just parse input file and pass it to reducer.. use only reducer..so that all maps output will go to one reducer and one file gets created,which is merge of input files.. On 03-Nov-2016 8:54 pm, "Piyush Mukati" wrote: > Hi, > I

How to add custom field to hadoop MR task log?

2016-11-03 Thread Maria
Hi, dear developers, I'm trying to reconfig $HADOOP/etc.hadoop/log4j.properties, I want to add an to mapreduce log before LOGmessage. Like this: "ID:234521 start map logic" My steps as follow: (1)In my Mapper Class: static Logger logger = LoggerFactory.getLogger(Mapper.class); public

Re: why the default value of 'yarn.resourcemanager.container.liveness-monitor.interval-ms' in yarn-default.xml is so high?

2016-11-03 Thread Ravi Prakash
Hi Tanvir! Although an application may request for that node, a container won't be scheduled until the nodemanager sends a heartbeat. If the application hasn't specified a preference for that node, then whichever node heartbeats next, will be used to launch a container. HTH Ravi On Thu, Nov 3,

Re: merging small files in HDFS

2016-11-03 Thread Madhav Sharan
Will key value based sequence file format work for you? You can keep KEY as name of your small file and VALUE as content. Sequence files can be passed as input to other jobs too. [0] can be a code reference which converts many small files into a big sequence file in mapreduce fashion. [1] is a

Re: why the default value of 'yarn.resourcemanager.container.liveness-monitor.interval-ms' in yarn-default.xml is so high?

2016-11-03 Thread Tanvir Rahman
Thank you Ravi for your reply. I found one parameter 'yarn.resourcemanager.nm.liveness-monitor.interval-ms' (default value=1000ms) in yarn-default.xml (v2.4.1) which determines how often to check that node managers are still alive. So RM is checking heartbeat of NM every second but it takes 10 min

RE: merging small files in HDFS

2016-11-03 Thread kumar, Senthil(AWF)
Can't we use getmerge here ? If you requirement is to merge some files in a particular directory to single file .. hadoop fs -getmerge --Senthil -Original Message- From: Giovanni Mascari [mailto:giovanni.masc...@polito.it] Sent: Thursday, November 03, 2016 7:24 PM To: Piyush Mukati

Re: merging small files in HDFS

2016-11-03 Thread Giovanni Mascari
Hi, if I correctly understand your request you need only to merge some data resulting from an hdfs write operation. In this case, I suppose that your best option is to use hadoop-stream with 'cat' command. take a look here: https://hadoop.apache.org/docs/r1.2.1/streaming.html Regards Il

merging small files in HDFS

2016-11-03 Thread Piyush Mukati
Hi, I want to merge multiple files in one HDFS dir to one file. I am planning to write a map only job using input format which will create only one inputSplit per dir. this way my job don't need to do any shuffle/sort.(only read and write back to disk) Is there any such file format already