I would like to better understand YARN's scheduling with named workers and 
relaxedLocality==true.  For example, suppose that I have a three-node cluster 
with nodes A,B,C.  Each node has capacity to run two tasks of the kind I desire 
simultaneously.  My AM then requests nine containers with worker-name set so 
that I am requesting three containers per worker.  The cluster starts idle and 
has no other users.  My questions:

*         Is it optimal to issue three ResourceRequests, each with 
numContainers==3?   (As opposed to nine requests)

*         Initially, I expect the RM to allocate two containers per node, and I 
expect to have the containers match the named workers.  Is this always the case?

*         If the first task completes on worker "B", can I rely on the 
ResourceRequest for "B" to be fulfilled next?

*         What techniques should be used to get the containers on the workers I 
expect most often?

*         What techniques should be used to reduce container allocation 
latency, if possible?
Thanks
John

Reply via email to