Number of Maps running more than expected

2012-08-16 Thread Gaurav Dasgupta
Hi users, I am working on a CDH3 cluster of 12 nodes (Task Trackers running on all the 12 nodes and 1 node running the Job Tracker). In order to perform a WordCount benchmark test, I did the following: - Executed "RandomTextWriter" first to create 100 GB data (Note that I have changed the "

Re: Number of Maps running more than expected

2012-08-16 Thread Anil Gupta
Hi Gaurav, Did you turn off speculative execution? Best Regards, Anil On Aug 16, 2012, at 7:13 AM, Gaurav Dasgupta wrote: > Hi users, > > I am working on a CDH3 cluster of 12 nodes (Task Trackers running on all the > 12 nodes and 1 node running the Job Tracker). > In order to perform a Word

Re: Number of Maps running more than expected

2012-08-16 Thread in.abdul
by > RandomTextWriter? And what is the purpose of these extra number of Maps? > > Regards, > Gaurav Dasgupta > > > -- > If you reply to this email, your message will be added to the discussion > below: > > http://lucene.472066.n3.nabble.c

Re: Number of Maps running more than expected

2012-08-16 Thread Bertrand Dechoux
arding the number > > of Maps for WordCount only when the dataset is generated by > > RandomTextWriter? And what is the purpose of these extra number of Maps? > > > > Regards, > > Gaurav Dasgupta > > > > > > -- > >

Re: Number of Maps running more than expected

2012-08-17 Thread Bejoy Ks
Hi Gaurav How many input files are there for the wordcount map reduce job? Do you have input files lesser than a block size? If you are using the default TextInputFormat there will be one task generated per file for sure, so if you have files less than block size the calculation specified here fo

Re: Number of Maps running more than expected

2012-08-17 Thread Bejoy Ks
Hi Gaurav To add on more clarity to my previous mail If you are using the default TextInputFormat there will be *atleast* one task generated per file even if the file size is less than the block size. (assuming you have split size equal to block size) So the right way to calculate the number of s

Re: Number of Maps running more than expected

2012-08-17 Thread Gaurav Dasgupta
Hi Bejoy, The total number of Maps in the RandomTextWriter execution were 100 and hence the total number of input files for WordCount are 100. My dfs.block.size = 128MB and I have not changed the mapred.max.split.size and could not find it in myJob.xml file. Hence refering the formula *max(minspli

Re: Number of Maps running more than expected

2012-08-17 Thread Bejoy Ks
Hi Gaurav While calculating you got the number of map tasks per file as 8.12 ie 9 map tasks for each file. So for 100 files it is 900 map tasks and now your numbers match. Doesn't it look right? Regards Bejoy KS

Re: Number of Maps running more than expected

2012-08-17 Thread Gaurav Dasgupta
Hi I have got it. It was my mistake understanding the calculation. Thanks for the help. Regards, Gaurav Dasgupta On Fri, Aug 17, 2012 at 5:15 PM, Bejoy Ks wrote: > Hi Gaurav > > While calculating you got the number of map tasks per file as 8.12 ie 9 > map tasks for each file. So for 100 files