hi anil I have one hard drive per slave. I have tested with 3 concurrent mappers and 28 concurrent mappers per slave. And both times the total time was about 1 hour the only difference was the time each map took aka respectfully 40min and 1h10min I have turned of the speculative execution.
I'll run a process tomorrow and look at disk I/O to check if it is the bottleneck. But the test I ran this afternoon with 3 or 28 max map tasks per node makes me doubt. When I run 28 map per node I can load the whole file in the available maps in one pass so all maps take 1h to complete so the whole process takes 1h and some minutes. When I run with 3 maps per node the whole file is imported through 7 full passes of available maps. In this case each map takes around 8-9 minutes to complete. So 7 passes times 9 minutes, the process takes about 1hour to complete same as before. This situation i don't understand and leads me to believe I have missed a step somwhere. If someone has an idea I'll gladly look into anything