[ https://issues.apache.org/jira/browse/YARN-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14058132#comment-14058132 ]
Jason Lowe commented on YARN-2263: ---------------------------------- 1 is an appropriate lower bound since we don't ever want the maximum number of applications for a user to be zero or less. (That would be a worthless queue since we could submit jobs to it but no jobs would activate.) I'm assuming it only causes a deadlock in the case where the active job submits and waits for the completion of other jobs? If it simply submits jobs and exits then even if the queue is so tiny that only 1 active job per user is allowed then the jobs should eventually complete (assuming sufficient resources to launch an AM _and_ at least one task simultaneously if this is MapReduce). If the concern is that the queue can be too small to allow running more than one application simultaneously for a user and some app frameworks might not like that, then yes that could be an issue. However I'm not sure that is YARN's problem to solve. I could have an application framework that for whatever reason requires 10 jobs to be running simultaneously to work. There could definitely be a queue config that will not allow that to run properly because the queue is too small to support 10 simultaneous applications by a single user. Should YARN handle this scenario? If so, how would it detect it, and what should it do to mitigate it? I would argue the same applies to the simpler job-launching-job-and-waiting scenario. Some queues are going to be too small to support that. Users can work around issues like this with smarter queue setups. This is touched upon in MAPREDUCE-4304 and elsewhere for the Oozie case which is a similar scenario. We can setup a separate queue for the launcher jobs separate from a queue where the other jobs run. That way we can't accidentally fill the cluster/queue with just launcher jobs and deadlock. > CSQueueUtils.computeMaxActiveApplicationsPerUser may cause deadlock for > nested MapReduce jobs > --------------------------------------------------------------------------------------------- > > Key: YARN-2263 > URL: https://issues.apache.org/jira/browse/YARN-2263 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 0.23.10, 2.4.1 > Reporter: Chen He > > computeMaxActiveApplicationsPerUser() has a lower bound "1". For a nested > MapReduce job which files new mapreduce jobs in its mapper/reducer, it will > cause job stuck. > public static int computeMaxActiveApplicationsPerUser( > int maxActiveApplications, int userLimit, float userLimitFactor) { > return Math.max( > (int)Math.ceil( > maxActiveApplications * (userLimit / 100.0f) * userLimitFactor), > 1); > } -- This message was sent by Atlassian JIRA (v6.2#6252)