Re: Distributed shell with host affinity and relaxLocality
On 17 Mar 2015, at 15:38, Chris Riccomini criccom...@linkedin.com.INVALID wrote: Any ideas? not from me. I think we've tested more on the Capacity Scheduler, primarily because I know who to ask for help with. I do know we (slider) need to have a different priority for placed vs relaxed requests. (didn't know about YARN-1974 BTW, that's something worth getting in) On 3/16/15 1:31 PM, Chris Riccomini criccom...@linkedin.com wrote: + Navina Hey Karthik, YARN 2.6.0 FairShare. Cheers, Chris On 3/16/15 1:28 PM, Karthik Kambatla ka...@cloudera.com wrote: Hey Chris What scheduler/version is this? On Mon, Mar 16, 2015 at 12:01 PM, Chris Riccomini criccom...@linkedin.com.invalid wrote: Hey all, We have been testing YARN with host-specific ContainerRequests. For our tests, we've been using the DistributedShell example. We've applied YARN-1974, which allows us to specify node lists, relax locality, etc. Everything seems to work as expected when we have relaxLocality set to false, and we request a specific host. When we set relaxLocality to true, things get weird. We run three nodes: node1, node2, and node3. When we start DistributedShell with, we configure it (via CLI params) to use two containers, and have a host-level request for node3. What we observe is that the AM and one container both end up on node2, and a third container ends up on node3. There are enough resources for node3 to handle both containers, but the second one doesn't end up there. We also notice that the DistributedShell app wedges because the container on node3 never completes. What is the expected behavior here? This seems to be broken. Cheers, Chris -- Karthik Kambatla Software Engineer, Cloudera Inc. http://five.sentenc.es
Re: Distributed shell with host affinity and relaxLocality
Any ideas? On 3/16/15 1:31 PM, Chris Riccomini criccom...@linkedin.com wrote: + Navina Hey Karthik, YARN 2.6.0 FairShare. Cheers, Chris On 3/16/15 1:28 PM, Karthik Kambatla ka...@cloudera.com wrote: Hey Chris What scheduler/version is this? On Mon, Mar 16, 2015 at 12:01 PM, Chris Riccomini criccom...@linkedin.com.invalid wrote: Hey all, We have been testing YARN with host-specific ContainerRequests. For our tests, we've been using the DistributedShell example. We've applied YARN-1974, which allows us to specify node lists, relax locality, etc. Everything seems to work as expected when we have relaxLocality set to false, and we request a specific host. When we set relaxLocality to true, things get weird. We run three nodes: node1, node2, and node3. When we start DistributedShell with, we configure it (via CLI params) to use two containers, and have a host-level request for node3. What we observe is that the AM and one container both end up on node2, and a third container ends up on node3. There are enough resources for node3 to handle both containers, but the second one doesn't end up there. We also notice that the DistributedShell app wedges because the container on node3 never completes. What is the expected behavior here? This seems to be broken. Cheers, Chris -- Karthik Kambatla Software Engineer, Cloudera Inc. http://five.sentenc.es
Re: Distributed shell with host affinity and relaxLocality
Hey Chris What scheduler/version is this? On Mon, Mar 16, 2015 at 12:01 PM, Chris Riccomini criccom...@linkedin.com.invalid wrote: Hey all, We have been testing YARN with host-specific ContainerRequests. For our tests, we've been using the DistributedShell example. We've applied YARN-1974, which allows us to specify node lists, relax locality, etc. Everything seems to work as expected when we have relaxLocality set to false, and we request a specific host. When we set relaxLocality to true, things get weird. We run three nodes: node1, node2, and node3. When we start DistributedShell with, we configure it (via CLI params) to use two containers, and have a host-level request for node3. What we observe is that the AM and one container both end up on node2, and a third container ends up on node3. There are enough resources for node3 to handle both containers, but the second one doesn't end up there. We also notice that the DistributedShell app wedges because the container on node3 never completes. What is the expected behavior here? This seems to be broken. Cheers, Chris -- Karthik Kambatla Software Engineer, Cloudera Inc. http://five.sentenc.es
Re: Distributed shell with host affinity and relaxLocality
+ Navina Hey Karthik, YARN 2.6.0 FairShare. Cheers, Chris On 3/16/15 1:28 PM, Karthik Kambatla ka...@cloudera.com wrote: Hey Chris What scheduler/version is this? On Mon, Mar 16, 2015 at 12:01 PM, Chris Riccomini criccom...@linkedin.com.invalid wrote: Hey all, We have been testing YARN with host-specific ContainerRequests. For our tests, we've been using the DistributedShell example. We've applied YARN-1974, which allows us to specify node lists, relax locality, etc. Everything seems to work as expected when we have relaxLocality set to false, and we request a specific host. When we set relaxLocality to true, things get weird. We run three nodes: node1, node2, and node3. When we start DistributedShell with, we configure it (via CLI params) to use two containers, and have a host-level request for node3. What we observe is that the AM and one container both end up on node2, and a third container ends up on node3. There are enough resources for node3 to handle both containers, but the second one doesn't end up there. We also notice that the DistributedShell app wedges because the container on node3 never completes. What is the expected behavior here? This seems to be broken. Cheers, Chris -- Karthik Kambatla Software Engineer, Cloudera Inc. http://five.sentenc.es
Distributed shell with host affinity and relaxLocality
Hey all, We have been testing YARN with host-specific ContainerRequests. For our tests, we've been using the DistributedShell example. We've applied YARN-1974, which allows us to specify node lists, relax locality, etc. Everything seems to work as expected when we have relaxLocality set to false, and we request a specific host. When we set relaxLocality to true, things get weird. We run three nodes: node1, node2, and node3. When we start DistributedShell with, we configure it (via CLI params) to use two containers, and have a host-level request for node3. What we observe is that the AM and one container both end up on node2, and a third container ends up on node3. There are enough resources for node3 to handle both containers, but the second one doesn't end up there. We also notice that the DistributedShell app wedges because the container on node3 never completes. What is the expected behavior here? This seems to be broken. Cheers, Chris