[jira] [Updated] (YARN-7051) FifoIntraQueuePreemptionPlugin can get concurrent modification exception
[ https://issues.apache.org/jira/browse/YARN-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-7051: - Attachment: YARN-7051.002.patch bq. so this won't be changing while createTempAppForResCalculation is looping over the list. However, I did find a race condition that throws an NPE within {{createTempAppForResCalculation}}. {noformat} java.lang.NullPointerException at org.apache.hadoop.yarn.util.resource.Resources.clone(Resources.java:155) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoIntraQueuePreemptionPlugin.createTempAppForResCalculation(FifoIntraQueuePreemptionPlugin.java:403) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoIntraQueuePreemptionPlugin.computeAppsIdealAllocation(FifoIntraQueuePreemptionPlugin.java:140) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.computeIntraQueuePreemptionDemand(IntraQueueCandidatesSelector.java:283) {noformat} The reason for this is that {{perUserAMUsed}} was populated with running apps prior to calling {{createTempAppForResCalculation}}, but then {{createTempAppForResCalculation}} loops through both running and pending apps. Attaching new patch that addresses this. > FifoIntraQueuePreemptionPlugin can get concurrent modification exception > > > Key: YARN-7051 > URL: https://issues.apache.org/jira/browse/YARN-7051 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, scheduler preemption, yarn >Affects Versions: 2.9.0, 2.8.1, 3.0.0-alpha3 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Critical > Attachments: YARN-7051.001.patch, YARN-7051.002.patch > > > {{FifoIntraQueuePreemptionPlugin#calculateUsedAMResourcesPerQueue}} has the > following code: > {code} > Collection runningApps = leafQueue.getApplications(); > Resource amUsed = Resources.createResource(0, 0); > for (FiCaSchedulerApp app : runningApps) { > {code} > {{runningApps}} is unmodifiable but not concurrent. This caused the > preemption monitor thread to crash in the RM in one of our clusters. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7051) FifoIntraQueuePreemptionPlugin can get concurrent modification exception
[ https://issues.apache.org/jira/browse/YARN-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-7051: - Attachment: YARN-7051.001.patch The YARN-7051.001.patch adds a leafqueue synchronization around the vulnerable code. I am still doing manual testing. > FifoIntraQueuePreemptionPlugin can get concurrent modification exception > > > Key: YARN-7051 > URL: https://issues.apache.org/jira/browse/YARN-7051 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, scheduler preemption, yarn >Affects Versions: 2.9.0, 2.8.1, 3.0.0-alpha3 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Critical > Attachments: YARN-7051.001.patch > > > {{FifoIntraQueuePreemptionPlugin#calculateUsedAMResourcesPerQueue}} has the > following code: > {code} > Collection runningApps = leafQueue.getApplications(); > Resource amUsed = Resources.createResource(0, 0); > for (FiCaSchedulerApp app : runningApps) { > {code} > {{runningApps}} is unmodifiable but not concurrent. This caused the > preemption monitor thread to crash in the RM in one of our clusters. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7051) FifoIntraQueuePreemptionPlugin can get concurrent modification exception
[ https://issues.apache.org/jira/browse/YARN-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-7051: - Target Version/s: 2.8.2 Summary: FifoIntraQueuePreemptionPlugin can get concurrent modification exception (was: FifoIntraQueuePreemptionPlugin can get concurrent modification exception/) > FifoIntraQueuePreemptionPlugin can get concurrent modification exception > > > Key: YARN-7051 > URL: https://issues.apache.org/jira/browse/YARN-7051 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, scheduler preemption, yarn >Affects Versions: 2.9.0, 2.8.1, 3.0.0-alpha3 >Reporter: Eric Payne >Priority: Critical > > {{FifoIntraQueuePreemptionPlugin#calculateUsedAMResourcesPerQueue}} has the > following code: > {code} > Collection runningApps = leafQueue.getApplications(); > Resource amUsed = Resources.createResource(0, 0); > for (FiCaSchedulerApp app : runningApps) { > {code} > {{runningApps}} is unmodifiable but not concurrent. This caused the > preemption monitor thread to crash in the RM in one of our clusters. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7051) FifoIntraQueuePreemptionPlugin can get concurrent modification exception/
[ https://issues.apache.org/jira/browse/YARN-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-7051: - Component/s: yarn scheduler preemption capacity scheduler > FifoIntraQueuePreemptionPlugin can get concurrent modification exception/ > - > > Key: YARN-7051 > URL: https://issues.apache.org/jira/browse/YARN-7051 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, scheduler preemption, yarn >Affects Versions: 2.9.0, 2.8.1, 3.0.0-alpha3 >Reporter: Eric Payne >Priority: Critical > > {{FifoIntraQueuePreemptionPlugin#calculateUsedAMResourcesPerQueue}} has the > following code: > {code} > Collection runningApps = leafQueue.getApplications(); > Resource amUsed = Resources.createResource(0, 0); > for (FiCaSchedulerApp app : runningApps) { > {code} > {{runningApps}} is unmodifiable but not concurrent. This caused the > preemption monitor thread to crash in the RM in one of our clusters. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org