Hi,

I want to make sure my understanding about task assignment in hadoop
is correct or not.

When scanning a file with multiple tasktrackers,
I am wondering how a task is assigned to each tasktracker .
Is it based on the block sequence or data locality ?

Let me explain my question by example.
There is a file which composed of 10 blocks (block1 to block10), and
block1 is the beginning of the file and block10 is the tail of the file.
When scanning the file with 3 tasktrackers (tt1 to tt3),
I am wondering if
task assignment is based on the block sequence like
first tt1 takes block1 and tt2 takes block2 and tt3 takes block3 and
tt1 takes block4 and so on
or
task assignment is based on the task(data) locality like
first tt1 takes block2(because it's located in the local) and tt2
takes block1 (because it's located in the local) and
tt3 takes block 4(because it's located in the local) and so on.

As far as I experienced and the definitive guide book says,
I think that the first case is the task assignment strategy.
(and if there are many replicas, closest one is picked.)

Is this right ?

If this is right, is there any way to do like the second case
with the current implementation ?

Thanks,

Hiroyuki

Reply via email to