[jira] [Commented] (YARN-2594) Potential deadlock in RM when querying ApplicationResourceUsageReport

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14154690#comment-14154690
 ] 

Hudson commented on YARN-2594:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #697 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/697/])
YARN-2594. Potential deadlock in RM when querying 
ApplicationResourceUsageReport. (Wangda Tan via kasha) (kasha: rev 
14d60dadc25b044a2887bf912ba5872367f2dffb)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java


 Potential deadlock in RM when querying ApplicationResourceUsageReport
 -

 Key: YARN-2594
 URL: https://issues.apache.org/jira/browse/YARN-2594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karam Singh
Assignee: Wangda Tan
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2594.patch, YARN-2594.patch, YARN-2594.patch


 ResoruceManager sometimes become un-responsive:
 There was in exception in ResourceManager log and contains only  following 
 type of messages:
 {code}
 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2594) Potential deadlock in RM when querying ApplicationResourceUsageReport

2014-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14154913#comment-14154913
 ] 

Hudson commented on YARN-2594:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1913 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1913/])
YARN-2594. Potential deadlock in RM when querying 
ApplicationResourceUsageReport. (Wangda Tan via kasha) (kasha: rev 
14d60dadc25b044a2887bf912ba5872367f2dffb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* hadoop-yarn-project/CHANGES.txt


 Potential deadlock in RM when querying ApplicationResourceUsageReport
 -

 Key: YARN-2594
 URL: https://issues.apache.org/jira/browse/YARN-2594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karam Singh
Assignee: Wangda Tan
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2594.patch, YARN-2594.patch, YARN-2594.patch


 ResoruceManager sometimes become un-responsive:
 There was in exception in ResourceManager log and contains only  following 
 type of messages:
 {code}
 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2594) Potential deadlock in RM when querying ApplicationResourceUsageReport

2014-09-30 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153536#comment-14153536
 ] 

Karthik Kambatla commented on YARN-2594:


We need to handle getFinalApplicationStatus, and may be 
{{createAndGetApplicationReport}} as well. In the latter, we can replace direct 
access of {{diagnostics}} with {{getDiagnostics}} to avoid races on diagnostics.

 Potential deadlock in RM when querying ApplicationResourceUsageReport
 -

 Key: YARN-2594
 URL: https://issues.apache.org/jira/browse/YARN-2594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karam Singh
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-2594.patch, YARN-2594.patch


 ResoruceManager sometimes become un-responsive:
 There was in exception in ResourceManager log and contains only  following 
 type of messages:
 {code}
 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2594) Potential deadlock in RM when querying ApplicationResourceUsageReport

2014-09-30 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153541#comment-14153541
 ] 

Karthik Kambatla commented on YARN-2594:


Also, it would be nice to add a comment next to the declaration of 
currentAttempt to say it is not protected by the readLock. 

 Potential deadlock in RM when querying ApplicationResourceUsageReport
 -

 Key: YARN-2594
 URL: https://issues.apache.org/jira/browse/YARN-2594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karam Singh
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-2594.patch, YARN-2594.patch


 ResoruceManager sometimes become un-responsive:
 There was in exception in ResourceManager log and contains only  following 
 type of messages:
 {code}
 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2594) Potential deadlock in RM when querying ApplicationResourceUsageReport

2014-09-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153557#comment-14153557
 ] 

Hadoop QA commented on YARN-2594:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672075/YARN-2594.patch
  against trunk revision ea32a66.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5182//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5182//console

This message is automatically generated.

 Potential deadlock in RM when querying ApplicationResourceUsageReport
 -

 Key: YARN-2594
 URL: https://issues.apache.org/jira/browse/YARN-2594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karam Singh
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-2594.patch, YARN-2594.patch


 ResoruceManager sometimes become un-responsive:
 There was in exception in ResourceManager log and contains only  following 
 type of messages:
 {code}
 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2594) Potential deadlock in RM when querying ApplicationResourceUsageReport

2014-09-30 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153608#comment-14153608
 ] 

zhihai xu commented on YARN-2594:
-

Hi [~leftnoteasy],
It will be good to use a local variable to save currentAttempt to avoid any 
potential null pointer exception in the future.

RMAppAttempt attempt = this.currentAttempt;
if (attempt != null) {
  return attempt.getTrackingUrl();
}

Without lock, it is possible that this.currentAttempt will be changed between 
null check and calling getTrackingUrl.
Using a local variable to save currentAttempt will solve this race condition.

 Potential deadlock in RM when querying ApplicationResourceUsageReport
 -

 Key: YARN-2594
 URL: https://issues.apache.org/jira/browse/YARN-2594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karam Singh
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-2594.patch, YARN-2594.patch


 ResoruceManager sometimes become un-responsive:
 There was in exception in ResourceManager log and contains only  following 
 type of messages:
 {code}
 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2594) Potential deadlock in RM when querying ApplicationResourceUsageReport

2014-09-30 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153771#comment-14153771
 ] 

Karthik Kambatla commented on YARN-2594:


Fair enough. We could improve the locking in RMAppImpl further, but I guess the 
follow-up JIRA to fix SchedulerApplicationAttempt would take care of things in 
a better way.

+1, pending Jenkins. 

 Potential deadlock in RM when querying ApplicationResourceUsageReport
 -

 Key: YARN-2594
 URL: https://issues.apache.org/jira/browse/YARN-2594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karam Singh
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-2594.patch, YARN-2594.patch, YARN-2594.patch


 ResoruceManager sometimes become un-responsive:
 There was in exception in ResourceManager log and contains only  following 
 type of messages:
 {code}
 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2594) Potential deadlock in RM when querying ApplicationResourceUsageReport

2014-09-30 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153779#comment-14153779
 ] 

zhihai xu commented on YARN-2594:
-

The new patch looks good to me.

 Potential deadlock in RM when querying ApplicationResourceUsageReport
 -

 Key: YARN-2594
 URL: https://issues.apache.org/jira/browse/YARN-2594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karam Singh
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-2594.patch, YARN-2594.patch, YARN-2594.patch


 ResoruceManager sometimes become un-responsive:
 There was in exception in ResourceManager log and contains only  following 
 type of messages:
 {code}
 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2594) Potential deadlock in RM when querying ApplicationResourceUsageReport

2014-09-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153968#comment-14153968
 ] 

Hadoop QA commented on YARN-2594:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672114/YARN-2594.patch
  against trunk revision 9582a50.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5188//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5188//console

This message is automatically generated.

 Potential deadlock in RM when querying ApplicationResourceUsageReport
 -

 Key: YARN-2594
 URL: https://issues.apache.org/jira/browse/YARN-2594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karam Singh
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-2594.patch, YARN-2594.patch, YARN-2594.patch


 ResoruceManager sometimes become un-responsive:
 There was in exception in ResourceManager log and contains only  following 
 type of messages:
 {code}
 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2594) Potential deadlock in RM when querying ApplicationResourceUsageReport

2014-09-30 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153981#comment-14153981
 ] 

Karthik Kambatla commented on YARN-2594:


Committing this. 

 Potential deadlock in RM when querying ApplicationResourceUsageReport
 -

 Key: YARN-2594
 URL: https://issues.apache.org/jira/browse/YARN-2594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karam Singh
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-2594.patch, YARN-2594.patch, YARN-2594.patch


 ResoruceManager sometimes become un-responsive:
 There was in exception in ResourceManager log and contains only  following 
 type of messages:
 {code}
 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2594) Potential deadlock in RM when querying ApplicationResourceUsageReport

2014-09-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14154018#comment-14154018
 ] 

Hudson commented on YARN-2594:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6157 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6157/])
YARN-2594. Potential deadlock in RM when querying 
ApplicationResourceUsageReport. (Wangda Tan via kasha) (kasha: rev 
14d60dadc25b044a2887bf912ba5872367f2dffb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* hadoop-yarn-project/CHANGES.txt


 Potential deadlock in RM when querying ApplicationResourceUsageReport
 -

 Key: YARN-2594
 URL: https://issues.apache.org/jira/browse/YARN-2594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karam Singh
Assignee: Wangda Tan
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2594.patch, YARN-2594.patch, YARN-2594.patch


 ResoruceManager sometimes become un-responsive:
 There was in exception in ResourceManager log and contains only  following 
 type of messages:
 {code}
 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2594) Potential deadlock in RM when querying ApplicationResourceUsageReport

2014-09-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14154069#comment-14154069
 ] 

Wangda Tan commented on YARN-2594:
--

Thanks [~kasha], [~jianhe] and [~zxu] for review and commit!  

 Potential deadlock in RM when querying ApplicationResourceUsageReport
 -

 Key: YARN-2594
 URL: https://issues.apache.org/jira/browse/YARN-2594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karam Singh
Assignee: Wangda Tan
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2594.patch, YARN-2594.patch, YARN-2594.patch


 ResoruceManager sometimes become un-responsive:
 There was in exception in ResourceManager log and contains only  following 
 type of messages:
 {code}
 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2594) Potential deadlock in RM when querying ApplicationResourceUsageReport

2014-09-26 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14150162#comment-14150162
 ] 

Jian He commented on YARN-2594:
---

current patch looks good to me, thanks all for the discussion !

 Potential deadlock in RM when querying ApplicationResourceUsageReport
 -

 Key: YARN-2594
 URL: https://issues.apache.org/jira/browse/YARN-2594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karam Singh
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-2594.patch


 ResoruceManager sometimes become un-responsive:
 There was in exception in ResourceManager log and contains only  following 
 type of messages:
 {code}
 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2594) Potential deadlock in RM when querying ApplicationResourceUsageReport

2014-09-26 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14150177#comment-14150177
 ] 

Karthik Kambatla commented on YARN-2594:


As I commented earlier, the current approach is fine with me. My review 
comments still apply:  we should avoid using readLock in other get methods that 
access RMAppImpl#currentAttempt. RMAppAttemptImpl should handle the 
thread-safety of its fields.

Can we also file follow-up JIRAs to cleanup synchronization in 
SchedulingApplicationAttempt? 

 Potential deadlock in RM when querying ApplicationResourceUsageReport
 -

 Key: YARN-2594
 URL: https://issues.apache.org/jira/browse/YARN-2594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karam Singh
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-2594.patch


 ResoruceManager sometimes become un-responsive:
 There was in exception in ResourceManager log and contains only  following 
 type of messages:
 {code}
 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2594) Potential deadlock in RM when querying ApplicationResourceUsageReport

2014-09-26 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14150301#comment-14150301
 ] 

Wangda Tan commented on YARN-2594:
--

Thanks [~jianhe] and [~kasha] for review, I created YARN-2614 to tracking 
SchedulerApplicationAttempt synchronization cleanups.

 Potential deadlock in RM when querying ApplicationResourceUsageReport
 -

 Key: YARN-2594
 URL: https://issues.apache.org/jira/browse/YARN-2594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karam Singh
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-2594.patch


 ResoruceManager sometimes become un-responsive:
 There was in exception in ResourceManager log and contains only  following 
 type of messages:
 {code}
 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2594) Potential deadlock in RM when querying ApplicationResourceUsageReport

2014-09-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148646#comment-14148646
 ] 

Wangda Tan commented on YARN-2594:
--

[~kasha],
Thanks for your comments, definitely we should reduce synchronized lock, but 
this problems seems not caused by this, 
Had a discussion with Jian He, We found 4 suspicious threads,

Thread #2/#4 try to acquire readlock but failed, but at the same time, *no 
writelock hold by anyone* (thread#3 is waiting for writelock). This is more 
like a bug of Java to me.
Followings are links of descriptions of that bug, and there's some other people 
claims this not yet fixed.
1) Java bug description: 
http://webcache.googleusercontent.com/search?q=cache:fjM5oxWzmCsJ:bugs.java.com/view_bug.do%3Fbug_id%3D6822370+cd=1hl=enct=clnkgl=hk
2) People report the bug still occurs:
http://cs.oswego.edu/pipermail/concurrency-interest/2010-September/007413.html

Thoughts? Following are thread#1-#4

*Thread#1*
{code}
IPC Server handler 45 on 8032 daemon prio=10 tid=0x7f032909b000 
nid=0x7bd7 waiting for monitor entry [0x7f0307aa9000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getResourceUsageReport(SchedulerApplicationAttempt.java:541)
- waiting to lock 0xe0e7ea70 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getAppResourceUsageReport(AbstractYarnScheduler.java:196)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getApplicationResourceUsageReport(RMAppAttemptImpl.java:703)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:569)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:294)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
{code}

*Thread#2*
{code}
ResourceManager Event Processor prio=10 tid=0x7f0328db9800 nid=0x7aeb 
waiting on condition [0x7f0311a48000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0xe0e72bc0 (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getCurrentAppAttempt(RMAppImpl.java:476)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$FinishedTransition.updateAttemptMetrics(RMContainerImpl.java:509)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$FinishedTransition.transition(RMContainerImpl.java:495)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$FinishedTransition.transition(RMContainerImpl.java:484)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
- locked 0xe0e85318 (a 

[jira] [Commented] (YARN-2594) Potential deadlock in RM when querying ApplicationResourceUsageReport

2014-09-25 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148665#comment-14148665
 ] 

zhihai xu commented on YARN-2594:
-

The [ReentrantReadWriteLock | 
http://tutorials.jenkov.com/java-util-concurrent/readwritelock.html] 
implementation  is 
{code}
Read Lock   If no threads have locked the ReadWriteLock for writing, 
and no thread have requested a write lock (but not yet obtained it). 
Thus, multiple threads can lock the lock for reading.
Write Lock  If no threads are reading or writing. 
Thus, only one thread at a time can lock the lock for writing
{code}
Base on the above information, the first three threads can cause a deadlock,
The readLock is firstly acquired by thread#1, then thread#3 is blocked for 
writeLock, finally when Thread#2 try to acquire the readLock, thread#2 is also 
blocked because thread#3 is requesting the writeLock before thread#2. 
So this is not a bug in Java.
The following is the source code in ReentrantReadWriteLock.java:
{code}
static final class NonfairSync extends Sync {
private static final long serialVersionUID = -8159625535654395037L;
final boolean writerShouldBlock() {
return false; // writers can always barge
}
final boolean readerShouldBlock() {
/* As a heuristic to avoid indefinite writer starvation,
 * block if the thread that momentarily appears to be head
 * of queue, if one exists, is a waiting writer.  This is
 * only a probabilistic effect since a new reader will not
 * block if there is a waiting writer behind other enabled
 * readers that have not yet drained from the queue.
 */
return apparentlyFirstQueuedIsExclusive();
}
}
{code}
readerShouldBlock will check whether any threads request writeLock before it.

 Potential deadlock in RM when querying ApplicationResourceUsageReport
 -

 Key: YARN-2594
 URL: https://issues.apache.org/jira/browse/YARN-2594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karam Singh
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-2594.patch


 ResoruceManager sometimes become un-responsive:
 There was in exception in ResourceManager log and contains only  following 
 type of messages:
 {code}
 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2594) Potential deadlock in RM when querying ApplicationResourceUsageReport

2014-09-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148676#comment-14148676
 ] 

Wangda Tan commented on YARN-2594:
--

[~zxu],
Thanks for the explanation, it's very helpful, now I can understand write lock 
can block read lock.

I've created a test program:
{code}
package sandbox;

import java.util.concurrent.locks.ReentrantReadWriteLock;
import java.util.concurrent.locks.ReentrantReadWriteLock.ReadLock;
import java.util.concurrent.locks.ReentrantReadWriteLock.WriteLock;

public class Tester {
  private static class ReadThread implements Runnable {
private String name;
private ReadLock readLock;

ReadThread(String name, ReadLock readLock) {
  this.name = name;
  this.readLock = readLock;
}
@Override
public void run() {
  System.out.println(try lock read -  + name);
  readLock.lock();
  System.out.println(lock read -  + name);
}
  }
  private static class WriteThread implements Runnable {
private String name;
private WriteLock writeLock;

WriteThread(String name, WriteLock writeLock) {
  this.name = name;
  this.writeLock = writeLock;
}

@Override
public void run() {
  System.out.println(try lock write -  + name);
  writeLock.lock();
  System.out.println(lock write -  + name);
}
  }
  
  public static void main(String[] args) throws InterruptedException {
ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
ReadLock readLock = lock.readLock();
WriteLock writeLock = lock.writeLock();

Thread r1 = new Thread(new ReadThread(1, readLock));
Thread r2 = new Thread(new ReadThread(2, readLock));
Thread w = new Thread(new WriteThread(3, writeLock));

r1.start();
Thread.sleep(100);
w.start();
Thread.sleep(100);
r2.start();
  }
}
{code}

Exactly as you described, a waiting write lock will block read block to avoid 
starvation.

 Potential deadlock in RM when querying ApplicationResourceUsageReport
 -

 Key: YARN-2594
 URL: https://issues.apache.org/jira/browse/YARN-2594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karam Singh
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-2594.patch


 ResoruceManager sometimes become un-responsive:
 There was in exception in ResourceManager log and contains only  following 
 type of messages:
 {code}
 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2594) Potential deadlock in RM when querying ApplicationResourceUsageReport

2014-09-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148696#comment-14148696
 ] 

Wangda Tan commented on YARN-2594:
--

I think previous uploaded patch can still solve the problem. Eliminate the read 
lock in thread#2 will make thread#2 not blocked by the pending writeLock, and 
it will release synchronized lock which thread#1 wait for, so thread#1 can 
continue too. After that, thread#3 can achieve writelock finally.

 Potential deadlock in RM when querying ApplicationResourceUsageReport
 -

 Key: YARN-2594
 URL: https://issues.apache.org/jira/browse/YARN-2594
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karam Singh
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-2594.patch


 ResoruceManager sometimes become un-responsive:
 There was in exception in ResourceManager log and contains only  following 
 type of messages:
 {code}
 2014-09-19 19:13:45,241 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000
 2014-09-19 19:30:26,312 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000
 2014-09-19 19:47:07,351 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000
 2014-09-19 20:03:48,460 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000
 2014-09-19 20:20:29,542 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000
 2014-09-19 20:37:10,635 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000
 2014-09-19 20:53:51,722 INFO  event.AsyncDispatcher 
 (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)