[ https://issues.apache.org/jira/browse/YARN-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755090#comment-13755090 ]
Bikas Saha commented on YARN-1127: ---------------------------------- Isnt this similar to a jira opened by you already? The issue being that the scheduler puts a reservation on a node whose total capacity is smaller than the reservation resource size. In this case, nm1 has capacity=1024 but the scheduler is putting a reservation of 2048 on it and that can never be satisfied. So it does not make sense to make that reservation at all. > reservation exchange and excess reservation is not working for capacity > scheduler > --------------------------------------------------------------------------------- > > Key: YARN-1127 > URL: https://issues.apache.org/jira/browse/YARN-1127 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 2.1.1-beta > Reporter: Omkar Vinit Joshi > Assignee: Omkar Vinit Joshi > Priority: Blocker > > I have 2 node managers. > * one with 1024 MB memory.(nm1) > * second with 2048 MB memory.(nm2) > I am submitting simple map reduce application with 1 mapper and one reducer > with 1024mb each. The steps to reproduce this are > * stop nm2 with 2048MB memory.( This I am doing to make sure that this node's > heartbeat doesn't reach RM first). > * now submit application. As soon as it receives first node's (nm1) heartbeat > it will try to reserve memory for AM-container (2048MB). However it has only > 1024MB of memory. > * now start nm2 with 2048 MB memory. > It hangs forever... Ideally this has two potential issues. > * Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available > memory. In this case if the original request was made without any locality > then scheduler should unreserve memory on nm1 and allocate requested 2048MB > container on nm2. > * We support a notion where if say we have 5 nodes with 4 AM and all node > managers have 8GB each and AM 2 GB each. Each AM is requesting 8GB each. Now > to avoid deadlock AM will make an extra reservation. By doing this we would > never hit the deadlock situation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira