[ https://issues.apache.org/jira/browse/YARN-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609278#comment-14609278 ]
Arun Suresh commented on YARN-3633: ----------------------------------- I guess the line that was introduced needs to be synchronized (guess we need to do the same for {{removeApp}} where we are subtracting).. given that you are adding/subtracting from "totalAmResourceUsage" defined in the {{FairScheduler}}.. and considering that the {{Resources#addTo/subtractFrom}} actually performs a get and set (and the value can change in between if some other AM is added/removed.. possibly during a concurrently running continuous scheduling attempt) > With Fair Scheduler, cluster can logjam when there are too many queues > ---------------------------------------------------------------------- > > Key: YARN-3633 > URL: https://issues.apache.org/jira/browse/YARN-3633 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler > Affects Versions: 2.6.0 > Reporter: Rohit Agarwal > Assignee: Rohit Agarwal > Priority: Critical > Attachments: YARN-3633-1.patch, YARN-3633.patch > > > It's possible to logjam a cluster by submitting many applications at once in > different queues. > For example, let's say there is a cluster with 20GB of total memory. Let's > say 4 users submit applications at the same time. The fair share of each > queue is 5GB. Let's say that maxAMShare is 0.5. So, each queue has at most > 2.5GB memory for AMs. If all the users requested AMs of size 3GB - the > cluster logjams. Nothing gets scheduled even when 20GB of resources are > available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)