[jira] [Commented] (YARN-1127) reservation exchange and excess reservation is not working for capacity scheduler

2014-03-07 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923897#comment-13923897
 ] 

Thomas Graves commented on YARN-1127:
-

The capacity scheduler should have eventually looked at the second node even 
with the first one being reserved. There is a formula for this where it its 
bias'd against really large requests.  What was your minimum allocation size 
and your maximum allocation size?  Can you still reproduce this on 2.3.0 or 
newer?

Also note that this should be superceded by 
https://issues.apache.org/jira/browse/YARN-1769 which makes it so that 
reservations will continue to look other heartbeating nodes .

 reservation exchange and excess reservation is not working for capacity 
 scheduler
 -

 Key: YARN-1127
 URL: https://issues.apache.org/jira/browse/YARN-1127
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
Priority: Blocker

 I have 2 node managers.
 * one with 1024 MB memory.(nm1)
 * second with 2048 MB memory.(nm2)
 I am submitting simple map reduce application with 1 mapper and one reducer 
 with 1024mb each. The steps to reproduce this are
 * stop nm2 with 2048MB memory.( This I am doing to make sure that this node's 
 heartbeat doesn't reach RM first).
 * now submit application. As soon as it receives first node's (nm1) heartbeat 
 it will try to reserve memory for AM-container (2048MB). However it has only 
 1024MB of memory.
 * now start nm2 with 2048 MB memory.
 It hangs forever... Ideally this has two potential issues.
 * Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available 
 memory. In this case if the original request was made without any locality 
 then scheduler should unreserve memory on nm1 and allocate requested 2048MB 
 container on nm2. 
 * We support a notion where if say we have 5 nodes with 4 AM and all node 
 managers have 8GB each and AM 2 GB each. Each AM is requesting 8GB each. Now 
 to avoid deadlock AM will make an extra reservation. By doing this we would 
 never hit the deadlock situation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1127) reservation exchange and excess reservation is not working for capacity scheduler

2013-08-30 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755090#comment-13755090
 ] 

Bikas Saha commented on YARN-1127:
--

Isnt this similar to a jira opened by you already? The issue being that the 
scheduler puts a reservation on a node whose total capacity is smaller than the 
reservation resource size. In this case, nm1 has capacity=1024 but the 
scheduler is putting a reservation of 2048 on it and that can never be 
satisfied. So it does not make sense to make that reservation at all.

 reservation exchange and excess reservation is not working for capacity 
 scheduler
 -

 Key: YARN-1127
 URL: https://issues.apache.org/jira/browse/YARN-1127
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
Priority: Blocker

 I have 2 node managers.
 * one with 1024 MB memory.(nm1)
 * second with 2048 MB memory.(nm2)
 I am submitting simple map reduce application with 1 mapper and one reducer 
 with 1024mb each. The steps to reproduce this are
 * stop nm2 with 2048MB memory.( This I am doing to make sure that this node's 
 heartbeat doesn't reach RM first).
 * now submit application. As soon as it receives first node's (nm1) heartbeat 
 it will try to reserve memory for AM-container (2048MB). However it has only 
 1024MB of memory.
 * now start nm2 with 2048 MB memory.
 It hangs forever... Ideally this has two potential issues.
 * Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available 
 memory. In this case if the original request was made without any locality 
 then scheduler should unreserve memory on nm1 and allocate requested 2048MB 
 container on nm2. 
 * We support a notion where if say we have 5 nodes with 4 AM and all node 
 managers have 8GB each and AM 2 GB each. Each AM is requesting 8GB each. Now 
 to avoid deadlock AM will make an extra reservation. By doing this we would 
 never hit the deadlock situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1127) reservation exchange and excess reservation is not working for capacity scheduler

2013-08-30 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755095#comment-13755095
 ] 

Bikas Saha commented on YARN-1127:
--

How is this different from YARN-957

 reservation exchange and excess reservation is not working for capacity 
 scheduler
 -

 Key: YARN-1127
 URL: https://issues.apache.org/jira/browse/YARN-1127
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
Priority: Blocker

 I have 2 node managers.
 * one with 1024 MB memory.(nm1)
 * second with 2048 MB memory.(nm2)
 I am submitting simple map reduce application with 1 mapper and one reducer 
 with 1024mb each. The steps to reproduce this are
 * stop nm2 with 2048MB memory.( This I am doing to make sure that this node's 
 heartbeat doesn't reach RM first).
 * now submit application. As soon as it receives first node's (nm1) heartbeat 
 it will try to reserve memory for AM-container (2048MB). However it has only 
 1024MB of memory.
 * now start nm2 with 2048 MB memory.
 It hangs forever... Ideally this has two potential issues.
 * Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available 
 memory. In this case if the original request was made without any locality 
 then scheduler should unreserve memory on nm1 and allocate requested 2048MB 
 container on nm2. 
 * We support a notion where if say we have 5 nodes with 4 AM and all node 
 managers have 8GB each and AM 2 GB each. Each AM is requesting 8GB each. Now 
 to avoid deadlock AM will make an extra reservation. By doing this we would 
 never hit the deadlock situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1127) reservation exchange and excess reservation is not working for capacity scheduler

2013-08-30 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755103#comment-13755103
 ] 

Omkar Vinit Joshi commented on YARN-1127:
-

No as per Arun I am separating out issues which are causing this failure.
* YARN-957 :- Fix if container is getting reserved on a node manager which 
exceeds its memory.
* this jira :- Ideally the switch should have taken place from one to other 
node manager if another node manager has sufficient memory. However that did 
not happen. This must have occurred either because excess reservation did not 
work or reservation exchange did not occur. We need to find the root cause  and 
fix this.

 reservation exchange and excess reservation is not working for capacity 
 scheduler
 -

 Key: YARN-1127
 URL: https://issues.apache.org/jira/browse/YARN-1127
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
Priority: Blocker

 I have 2 node managers.
 * one with 1024 MB memory.(nm1)
 * second with 2048 MB memory.(nm2)
 I am submitting simple map reduce application with 1 mapper and one reducer 
 with 1024mb each. The steps to reproduce this are
 * stop nm2 with 2048MB memory.( This I am doing to make sure that this node's 
 heartbeat doesn't reach RM first).
 * now submit application. As soon as it receives first node's (nm1) heartbeat 
 it will try to reserve memory for AM-container (2048MB). However it has only 
 1024MB of memory.
 * now start nm2 with 2048 MB memory.
 It hangs forever... Ideally this has two potential issues.
 * Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available 
 memory. In this case if the original request was made without any locality 
 then scheduler should unreserve memory on nm1 and allocate requested 2048MB 
 container on nm2. 
 * We support a notion where if say we have 5 nodes with 4 AM and all node 
 managers have 8GB each and AM 2 GB each. Each AM is requesting 8GB each. Now 
 to avoid deadlock AM will make an extra reservation. By doing this we would 
 never hit the deadlock situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1127) reservation exchange and excess reservation is not working for capacity scheduler

2013-08-30 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755110#comment-13755110
 ] 

Bikas Saha commented on YARN-1127:
--

Then please clarify this in the description or comment. Otherwise it looked 
like an exact duplicate. So the purpose of this jira is to fix the following 
situation.
1) NM1 has 2048 capacity in total but only 512 is free. A reservation of 1024 
is placed on it
2) NM2 now reports 1024 free space. At this point, the above reservation should 
be removed from NM1 and container should be assigned to NM2.
Step 2 is not happening and this jira intends to fix it.

 reservation exchange and excess reservation is not working for capacity 
 scheduler
 -

 Key: YARN-1127
 URL: https://issues.apache.org/jira/browse/YARN-1127
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
Priority: Blocker

 I have 2 node managers.
 * one with 1024 MB memory.(nm1)
 * second with 2048 MB memory.(nm2)
 I am submitting simple map reduce application with 1 mapper and one reducer 
 with 1024mb each. The steps to reproduce this are
 * stop nm2 with 2048MB memory.( This I am doing to make sure that this node's 
 heartbeat doesn't reach RM first).
 * now submit application. As soon as it receives first node's (nm1) heartbeat 
 it will try to reserve memory for AM-container (2048MB). However it has only 
 1024MB of memory.
 * now start nm2 with 2048 MB memory.
 It hangs forever... Ideally this has two potential issues.
 * Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available 
 memory. In this case if the original request was made without any locality 
 then scheduler should unreserve memory on nm1 and allocate requested 2048MB 
 container on nm2. 
 * We support a notion where if say we have 5 nodes with 4 AM and all node 
 managers have 8GB each and AM 2 GB each. Each AM is requesting 8GB each. Now 
 to avoid deadlock AM will make an extra reservation. By doing this we would 
 never hit the deadlock situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira