[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13835202#comment-13835202
 ] 

qingwu.fu commented on YARN-1458:
---------------------------------

hi all,
     Here's other phenomena :
     1.       If some one submit job, the resourcemanager accepts it, but the 
job doesn’t run. In the meantime, the resourcemanager print a lot logs like 
“2013-11-27 14:27:02,258 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Request for appInfo of unknown attemptappattempt_1384743376038_1121_000001”, 
and the fairscheduler doesn’t print hearbeat log to 
${HADOOP_HOME}/logs/fairscheduler/hadoop-{user}-fairscheduler.log
     2.       The fairscheduler ui can’t be opened and response 500 error.

And here's resourcemanager log when this error appearing:
 Here's  normal logs:
2013-11-27 14:25:36,515 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application with 
id 1120 submitted by user root
2013-11-27 14:25:36,515 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing 
application with id application_1384743376038_1120
2013-11-27 14:25:36,515 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root     
IP=192.168.24.101       OPERATION=Submit Application Request    
TARGET=ClientRMService  RESULT=SUCCESS  APPID=application_1384743376038_1120
2013-11-27 14:25:36,515 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1384743376038_1120 State change from NEW to NEW_SAVING
2013-11-27 14:25:36,515 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing 
info for app: application_1384743376038_1120
2013-11-27 14:25:36,516 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1384743376038_1120 State change from NEW_SAVING to SUBMITTED
2013-11-27 14:25:36,516 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
Registering app attempt : appattempt_1384743376038_1120_000001
2013-11-27 14:25:36,516 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1384743376038_1120_000001 State change from NEW to SUBMITTED
2013-11-27 14:25:36,516 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Application Submission: appattempt_1384743376038_1120_000001, user: root, 
currently active: 2
2013-11-27 14:25:36,516 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1384743376038_1120_000001 State change from SUBMITTED to SCHEDULED
2013-11-27 14:25:36,516 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1384743376038_1120 State change from SUBMITTED to ACCEPTED
2013-11-27 14:25:36,816 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: 
Node offered to app: application_1384743376038_1120 reserved: false
 
 
Abnormal logs:  these logs doesn’t contain the log like :
 2013-11-27 14:25:36,516 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Application Submission: appattempt_1384743376038_1120_000001, user: root, 
currently active: 2
 
Here is abnormal logs:
2013-11-27 14:27:01,391 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new 
applicationId: 1122
2013-11-27 14:27:01,391 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new 
applicationId: 1121
2013-11-27 14:27:02,252 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application with 
id 1121 submitted by user yangping.wu
2013-11-27 14:27:02,252 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application with 
id 1122 submitted by user yangping.wu
2013-11-27 14:27:02,252 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yangping.wu   
   IP=192.168.24.101 OPERATION=Submit Application Request    
TARGET=ClientRMService  RESULT=SUCCESS  APPID=application_1384743376038_1121
2013-11-27 14:27:02,252 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing 
application with id application_1384743376038_1122
2013-11-27 14:27:02,252 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yangping.wu   
   IP=192.168.24.101
       OPERATION=Submit Application Request    TARGET=ClientRMService  
RESULT=SUCCESS  APPID=application_1384743376038_1122
2013-11-27 14:27:02,252 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1384743376038_1122 State change from NEW to NEW_SAVING
2013-11-27 14:27:02,252 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing 
application with id application_1384743376038_1121
2013-11-27 14:27:02,252 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing 
info for app: application_1384743376038_1122
2013-11-27 14:27:02,252 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1384743376038_1121 State change from NEW to NEW_SAVING
2013-11-27 14:27:02,252 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing 
info for app: application_1384743376038_1121
2013-11-27 14:27:02,252 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1384743376038_1122 State change from NEW_SAVING to SUBMITTED
2013-11-27 14:27:02,253 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1384743376038_1121 State change from NEW_SAVING to SUBMITTED
2013-11-27 14:27:02,253 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
Registering app attempt : appattempt_1384743376038_1122_000001
2013-11-27 14:27:02,253 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1384743376038_1122_000001 State change from NEW to SUBMITTED
2013-11-27 14:27:02,253 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
Registering app attempt : appattempt_1384743376038_1121_000001
2013-11-27 14:27:02,253 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1384743376038_1121_000001 State change from NEW to SUBMITTED
2013-11-27 14:27:02,258 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Request for appInfo of unknown attemptappattempt_1384743376038_1122_000001
2013-11-27 14:27:02,258 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Request for appInfo of unknown attemptappattempt_1384743376038_1121_000001



> hadoop2.2.0 fairscheduler ResourceManager Event Processor thread blocked
> ------------------------------------------------------------------------
>
>                 Key: YARN-1458
>                 URL: https://issues.apache.org/jira/browse/YARN-1458
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: 2.2.0
>         Environment: Centos 2.6.18-238.19.1.el5 X86_64
> hadoop2.2.0
>            Reporter: qingwu.fu
>              Labels: patch
>   Original Estimate: 408h
>  Remaining Estimate: 408h
>
> The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
> clients submit lots jobs, it is not easy to reapear. We run the test cluster 
> for days to reapear it. The output of  jstack command on resourcemanager pid:
>  "ResourceManager Event Processor" prio=10 tid=0x00002aaab0c5f000 nid=0x5dd3 
> waiting for monitor entry [0x0000000043aa9000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
>         - waiting to lock <0x000000070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
>         at java.lang.Thread.run(Thread.java:744)
> ……
> "FairSchedulerUpdateThread" daemon prio=10 tid=0x00002aaab0a2c800 nid=0x5dc8 
> runnable [0x00000000433a2000]
>    java.lang.Thread.State: RUNNABLE
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
>         - locked <0x000000070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
>         - locked <0x000000070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
>         at java.lang.Thread.run(Thread.java:744)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to