Hi Gaurav To add on more clarity to my previous mail If you are using the default TextInputFormat there will be *atleast* one task generated per file even if the file size is less than the block size. (assuming you have split size equal to block size)
So the right way to calculate the number of splits is per file and not on the whole input data size. Calculate number of blocks per file and summing up those values from all files would equate to the number of mappers. What is the value of mapred.max.splitsize in your job? If it is less than the hdfs block size there will be more spits for even for a hdfs block. Regards Bejoy KS