Hi Gaurav

To add on more clarity to my previous mail
If you are using the default TextInputFormat there will be *atleast* one
task generated per file even if the file size is less than
the block size. (assuming you have split size equal to block size)

So the right way to calculate the number of splits is per file and not on
the whole input data size. Calculate number of blocks per file and summing
up those values from all files would equate to the number of mappers.

What is the value of mapred.max.splitsize in your job? If it is less than
the hdfs block size there will be more spits for even for a hdfs block.

Regards
Bejoy KS

Reply via email to