Hi everyone,
I am trying to understand how locality is actually implemented in Hadoop, specifically in the Capacity Scheduler. I see that an application has to specify the "location" for a request, that can be a node, a rack, or ANY (*). For this reason I want to ask if an application that wants a resource request to be considered at higher levels of locality, has to submit more than just one request for the same resource (e.g.: if I want my request to be considered also at rack and off-switch level, I will issue a request specifying the node, one specifying the rack, and one with just *, all of them with the relax locality set to true). This seems to me as a necessity since in the Capacity scheduler code the rack local requests (and similarly for the off-switch ones) are obtained in a way like this:

application.getResourceRequest(priority, *node.getRackName()*)

while if I submitted a single request just for a specific node, even allowing relaxed locality, the scheduler would not be able to process it at rack level, since it specifically looks for requests made for the current rack (and at switch level too).
Is this correct?
If it is, what's the point of the relax locality parameter? I don't see a possible situation when I would have any request with relax locality set to false (in particular for rack and off-switch levels). Why would an application issue a rack-level request with relax locality set to false?

Thanks in advance

Fabio

Reply via email to