Re: Distributed shell with host affinity and relaxLocality

2015-03-18 Thread Steve Loughran

 On 17 Mar 2015, at 15:38, Chris Riccomini criccom...@linkedin.com.INVALID 
 wrote:
 
 Any ideas?

not from me. I think we've tested more on the Capacity Scheduler, primarily 
because I know who to ask for help with.

I do know we (slider) need to have a different priority for placed vs relaxed 
requests. 

(didn't know about YARN-1974 BTW, that's something worth getting in)

 
 On 3/16/15 1:31 PM, Chris Riccomini criccom...@linkedin.com wrote:
 
 + Navina
 
 Hey Karthik,
 
 YARN 2.6.0 FairShare.
 
 Cheers,
 Chris
 
 On 3/16/15 1:28 PM, Karthik Kambatla ka...@cloudera.com wrote:
 
 Hey Chris
 
 What scheduler/version is this?
 
 On Mon, Mar 16, 2015 at 12:01 PM, Chris Riccomini 
 criccom...@linkedin.com.invalid wrote:
 
 Hey all,
 
 We have been testing YARN with host-specific ContainerRequests. For our
 tests, we've been using the DistributedShell example. We've applied
 YARN-1974, which allows us to specify node lists, relax locality, etc.
 Everything seems to work as expected when we have relaxLocality set to
 false, and we request a specific host.
 
 When we set relaxLocality to true, things get weird. We run three
 nodes:
 node1, node2, and node3. When we start DistributedShell with, we
 configure
 it (via CLI params) to use two containers, and have a host-level
 request
 for node3. What we observe is that the AM and one container both end up
 on
 node2, and a third container ends up on node3. There are enough
 resources
 for node3 to handle both containers, but the second one doesn't end up
 there. We also notice that the DistributedShell app wedges because the
 container on node3 never completes.
 
 What is the expected behavior here? This seems to be broken.
 
 Cheers,
 Chris
 
 
 
 
 -- 
 Karthik Kambatla
 Software Engineer, Cloudera Inc.
 
 http://five.sentenc.es
 
 



Re: Distributed shell with host affinity and relaxLocality

2015-03-17 Thread Chris Riccomini
Any ideas?

On 3/16/15 1:31 PM, Chris Riccomini criccom...@linkedin.com wrote:

+ Navina

Hey Karthik,

YARN 2.6.0 FairShare.

Cheers,
Chris

On 3/16/15 1:28 PM, Karthik Kambatla ka...@cloudera.com wrote:

Hey Chris

What scheduler/version is this?

On Mon, Mar 16, 2015 at 12:01 PM, Chris Riccomini 
criccom...@linkedin.com.invalid wrote:

 Hey all,

 We have been testing YARN with host-specific ContainerRequests. For our
 tests, we've been using the DistributedShell example. We've applied
 YARN-1974, which allows us to specify node lists, relax locality, etc.
 Everything seems to work as expected when we have relaxLocality set to
 false, and we request a specific host.

 When we set relaxLocality to true, things get weird. We run three
nodes:
 node1, node2, and node3. When we start DistributedShell with, we
configure
 it (via CLI params) to use two containers, and have a host-level
request
 for node3. What we observe is that the AM and one container both end up
on
 node2, and a third container ends up on node3. There are enough
resources
 for node3 to handle both containers, but the second one doesn't end up
 there. We also notice that the DistributedShell app wedges because the
 container on node3 never completes.

 What is the expected behavior here? This seems to be broken.

 Cheers,
 Chris




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es




Re: Distributed shell with host affinity and relaxLocality

2015-03-16 Thread Karthik Kambatla
Hey Chris

What scheduler/version is this?

On Mon, Mar 16, 2015 at 12:01 PM, Chris Riccomini 
criccom...@linkedin.com.invalid wrote:

 Hey all,

 We have been testing YARN with host-specific ContainerRequests. For our
 tests, we've been using the DistributedShell example. We've applied
 YARN-1974, which allows us to specify node lists, relax locality, etc.
 Everything seems to work as expected when we have relaxLocality set to
 false, and we request a specific host.

 When we set relaxLocality to true, things get weird. We run three nodes:
 node1, node2, and node3. When we start DistributedShell with, we configure
 it (via CLI params) to use two containers, and have a host-level request
 for node3. What we observe is that the AM and one container both end up on
 node2, and a third container ends up on node3. There are enough resources
 for node3 to handle both containers, but the second one doesn't end up
 there. We also notice that the DistributedShell app wedges because the
 container on node3 never completes.

 What is the expected behavior here? This seems to be broken.

 Cheers,
 Chris




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: Distributed shell with host affinity and relaxLocality

2015-03-16 Thread Chris Riccomini
+ Navina

Hey Karthik,

YARN 2.6.0 FairShare.

Cheers,
Chris

On 3/16/15 1:28 PM, Karthik Kambatla ka...@cloudera.com wrote:

Hey Chris

What scheduler/version is this?

On Mon, Mar 16, 2015 at 12:01 PM, Chris Riccomini 
criccom...@linkedin.com.invalid wrote:

 Hey all,

 We have been testing YARN with host-specific ContainerRequests. For our
 tests, we've been using the DistributedShell example. We've applied
 YARN-1974, which allows us to specify node lists, relax locality, etc.
 Everything seems to work as expected when we have relaxLocality set to
 false, and we request a specific host.

 When we set relaxLocality to true, things get weird. We run three nodes:
 node1, node2, and node3. When we start DistributedShell with, we
configure
 it (via CLI params) to use two containers, and have a host-level request
 for node3. What we observe is that the AM and one container both end up
on
 node2, and a third container ends up on node3. There are enough
resources
 for node3 to handle both containers, but the second one doesn't end up
 there. We also notice that the DistributedShell app wedges because the
 container on node3 never completes.

 What is the expected behavior here? This seems to be broken.

 Cheers,
 Chris




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es



Distributed shell with host affinity and relaxLocality

2015-03-16 Thread Chris Riccomini
Hey all,

We have been testing YARN with host-specific ContainerRequests. For our tests, 
we've been using the DistributedShell example. We've applied YARN-1974, which 
allows us to specify node lists, relax locality, etc. Everything seems to work 
as expected when we have relaxLocality set to false, and we request a specific 
host.

When we set relaxLocality to true, things get weird. We run three nodes: node1, 
node2, and node3. When we start DistributedShell with, we configure it (via CLI 
params) to use two containers, and have a host-level request for node3. What we 
observe is that the AM and one container both end up on node2, and a third 
container ends up on node3. There are enough resources for node3 to handle both 
containers, but the second one doesn't end up there. We also notice that the 
DistributedShell app wedges because the container on node3 never completes.

What is the expected behavior here? This seems to be broken.

Cheers,
Chris