Hi users,
I am working on a CDH3 cluster of 12 nodes (Task Trackers running on all
the 12 nodes and 1 node running the Job Tracker).
In order to perform a WordCount benchmark test, I did the following:
- Executed "RandomTextWriter" first to create 100 GB data (Note that I
have changed the "
Hi Gaurav,
Did you turn off speculative execution?
Best Regards,
Anil
On Aug 16, 2012, at 7:13 AM, Gaurav Dasgupta wrote:
> Hi users,
>
> I am working on a CDH3 cluster of 12 nodes (Task Trackers running on all the
> 12 nodes and 1 node running the Job Tracker).
> In order to perform a Word
by
> RandomTextWriter? And what is the purpose of these extra number of Maps?
>
> Regards,
> Gaurav Dasgupta
>
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.c
arding the number
> > of Maps for WordCount only when the dataset is generated by
> > RandomTextWriter? And what is the purpose of these extra number of Maps?
> >
> > Regards,
> > Gaurav Dasgupta
> >
> >
> > --
> >
Hi Gaurav
How many input files are there for the wordcount map reduce job? Do you
have input files lesser than a block size? If you are using the default
TextInputFormat there will be one task generated per file for sure, so if
you have files less than block size the calculation specified here fo
Hi Gaurav
To add on more clarity to my previous mail
If you are using the default TextInputFormat there will be *atleast* one
task generated per file even if the file size is less than
the block size. (assuming you have split size equal to block size)
So the right way to calculate the number of s
Hi Bejoy,
The total number of Maps in the RandomTextWriter execution were 100 and
hence the total number of input files for WordCount are 100.
My dfs.block.size = 128MB and I have not changed the
mapred.max.split.size and could not find it in myJob.xml file.
Hence refering the formula *max(minspli
Hi Gaurav
While calculating you got the number of map tasks per file as 8.12 ie 9 map
tasks for each file. So for 100 files it is 900 map tasks and now your
numbers match. Doesn't it look right?
Regards
Bejoy KS
Hi
I have got it. It was my mistake understanding the calculation. Thanks for
the help.
Regards,
Gaurav Dasgupta
On Fri, Aug 17, 2012 at 5:15 PM, Bejoy Ks wrote:
> Hi Gaurav
>
> While calculating you got the number of map tasks per file as 8.12 ie 9
> map tasks for each file. So for 100 files