Hi everyone,
I am trying to understand how locality is actually implemented in
Hadoop, specifically in the Capacity Scheduler.
I see that an application has to specify the "location" for a request,
that can be a node, a rack, or ANY (*).
For this reason I want to ask if an application that wants a resource
request to be considered at higher levels of locality, has to submit
more than just one request for the same resource (e.g.: if I want my
request to be considered also at rack and off-switch level, I will issue
a request specifying the node, one specifying the rack, and one with
just *, all of them with the relax locality set to true).
This seems to me as a necessity since in the Capacity scheduler code the
rack local requests (and similarly for the off-switch ones) are obtained
in a way like this:
application.getResourceRequest(priority, *node.getRackName()*)
while if I submitted a single request just for a specific node, even
allowing relaxed locality, the scheduler would not be able to process it
at rack level, since it specifically looks for requests made for the
current rack (and at switch level too).
Is this correct?
If it is, what's the point of the relax locality parameter? I don't see
a possible situation when I would have any request with relax locality
set to false (in particular for rack and off-switch levels). Why would
an application issue a rack-level request with relax locality set to false?
Thanks in advance
Fabio