I would like to better understand YARN's scheduling with named workers and relaxedLocality==true. For example, suppose that I have a three-node cluster with nodes A,B,C. Each node has capacity to run two tasks of the kind I desire simultaneously. My AM then requests nine containers with worker-name set so that I am requesting three containers per worker. The cluster starts idle and has no other users. My questions:
* Is it optimal to issue three ResourceRequests, each with numContainers==3? (As opposed to nine requests) * Initially, I expect the RM to allocate two containers per node, and I expect to have the containers match the named workers. Is this always the case? * If the first task completes on worker "B", can I rely on the ResourceRequest for "B" to be fulfilled next? * What techniques should be used to get the containers on the workers I expect most often? * What techniques should be used to reduce container allocation latency, if possible? Thanks John