That makes sense. It's worth pointing out that tasks are scheduled on a
pull basis -- tasktrackers ask for more work if they have free slots for
tasks -- so it is not a given that all nodes will receive the same number of
tasks. If some tasks take considerably longer (or some nodes are
hi all,
We are using hadoop-0.19.1 on about 200 nodes. We find there are lots of
slaves keep Child process even the job is done.
Here is an example, the process is running since AUGEST 09!
1000 24625 1 0 Aug09 ?00:00:38 (...java... classpath)
org.apache.hadoop.mapred.Child