hi anil

I have one hard drive per slave.
I have tested with 3 concurrent mappers and 28 concurrent mappers per slave. 
And both times the total time was about 1 hour the only difference was
the time each map took aka respectfully 40min and 1h10min 
I have turned of the speculative execution.

I'll run a process tomorrow and look at disk I/O to check if
it is the bottleneck.

But the test I ran this afternoon with 3 or 28 max map tasks per node
makes me doubt.
When I run 28 map per node I can load the whole file in the
available maps in one pass so all maps take 1h to complete
so the whole process takes 1h and some minutes.

When I run with 3 maps per node the whole file is imported through 7
full passes of available maps. 
In this case each map takes around 8-9 minutes to complete.
So 7 passes times 9 minutes, the process takes about 1hour to complete
same as before.

This situation i don't understand and leads me to believe
I have missed a step somwhere. 

If someone has an idea I'll gladly look into anything


Reply via email to