[jira] [Updated] (YARN-7051) FifoIntraQueuePreemptionPlugin can get concurrent modification exception

2017-08-24 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7051:
-
Attachment: YARN-7051.002.patch

bq.  so this won't be changing while createTempAppForResCalculation is looping 
over the list.
However, I did find a race condition that throws an NPE within 
{{createTempAppForResCalculation}}.

{noformat}
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.util.resource.Resources.clone(Resources.java:155)
at 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoIntraQueuePreemptionPlugin.createTempAppForResCalculation(FifoIntraQueuePreemptionPlugin.java:403)
at 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoIntraQueuePreemptionPlugin.computeAppsIdealAllocation(FifoIntraQueuePreemptionPlugin.java:140)
at 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.computeIntraQueuePreemptionDemand(IntraQueueCandidatesSelector.java:283)
{noformat}

The reason for this is that {{perUserAMUsed}} was populated with running apps 
prior to calling {{createTempAppForResCalculation}}, but then 
{{createTempAppForResCalculation}} loops through both running and pending apps.

Attaching new patch that addresses this.

> FifoIntraQueuePreemptionPlugin can get concurrent modification exception
> 
>
> Key: YARN-7051
> URL: https://issues.apache.org/jira/browse/YARN-7051
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, scheduler preemption, yarn
>Affects Versions: 2.9.0, 2.8.1, 3.0.0-alpha3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Critical
> Attachments: YARN-7051.001.patch, YARN-7051.002.patch
>
>
> {{FifoIntraQueuePreemptionPlugin#calculateUsedAMResourcesPerQueue}} has the 
> following code:
> {code}
> Collection runningApps = leafQueue.getApplications();
> Resource amUsed = Resources.createResource(0, 0);
> for (FiCaSchedulerApp app : runningApps) {
> {code}
> {{runningApps}} is unmodifiable but not concurrent. This caused the 
> preemption monitor thread to crash in the RM in one of our clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7051) FifoIntraQueuePreemptionPlugin can get concurrent modification exception

2017-08-19 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7051:
-
Attachment: YARN-7051.001.patch

The YARN-7051.001.patch adds a leafqueue synchronization around the vulnerable 
code. I am still doing manual testing.

> FifoIntraQueuePreemptionPlugin can get concurrent modification exception
> 
>
> Key: YARN-7051
> URL: https://issues.apache.org/jira/browse/YARN-7051
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, scheduler preemption, yarn
>Affects Versions: 2.9.0, 2.8.1, 3.0.0-alpha3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Critical
> Attachments: YARN-7051.001.patch
>
>
> {{FifoIntraQueuePreemptionPlugin#calculateUsedAMResourcesPerQueue}} has the 
> following code:
> {code}
> Collection runningApps = leafQueue.getApplications();
> Resource amUsed = Resources.createResource(0, 0);
> for (FiCaSchedulerApp app : runningApps) {
> {code}
> {{runningApps}} is unmodifiable but not concurrent. This caused the 
> preemption monitor thread to crash in the RM in one of our clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7051) FifoIntraQueuePreemptionPlugin can get concurrent modification exception

2017-08-18 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-7051:
-
Target Version/s: 2.8.2
 Summary: FifoIntraQueuePreemptionPlugin can get concurrent 
modification exception  (was: FifoIntraQueuePreemptionPlugin can get concurrent 
modification exception/)

> FifoIntraQueuePreemptionPlugin can get concurrent modification exception
> 
>
> Key: YARN-7051
> URL: https://issues.apache.org/jira/browse/YARN-7051
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, scheduler preemption, yarn
>Affects Versions: 2.9.0, 2.8.1, 3.0.0-alpha3
>Reporter: Eric Payne
>Priority: Critical
>
> {{FifoIntraQueuePreemptionPlugin#calculateUsedAMResourcesPerQueue}} has the 
> following code:
> {code}
> Collection runningApps = leafQueue.getApplications();
> Resource amUsed = Resources.createResource(0, 0);
> for (FiCaSchedulerApp app : runningApps) {
> {code}
> {{runningApps}} is unmodifiable but not concurrent. This caused the 
> preemption monitor thread to crash in the RM in one of our clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7051) FifoIntraQueuePreemptionPlugin can get concurrent modification exception/

2017-08-18 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7051:
-
Component/s: yarn
 scheduler preemption
 capacity scheduler

> FifoIntraQueuePreemptionPlugin can get concurrent modification exception/
> -
>
> Key: YARN-7051
> URL: https://issues.apache.org/jira/browse/YARN-7051
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, scheduler preemption, yarn
>Affects Versions: 2.9.0, 2.8.1, 3.0.0-alpha3
>Reporter: Eric Payne
>Priority: Critical
>
> {{FifoIntraQueuePreemptionPlugin#calculateUsedAMResourcesPerQueue}} has the 
> following code:
> {code}
> Collection runningApps = leafQueue.getApplications();
> Resource amUsed = Resources.createResource(0, 0);
> for (FiCaSchedulerApp app : runningApps) {
> {code}
> {{runningApps}} is unmodifiable but not concurrent. This caused the 
> preemption monitor thread to crash in the RM in one of our clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org