[jira] [Commented] (YARN-5969) FairShareComparator getResourceUsage poor performance

2016-12-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15779902#comment-15779902
 ] 

Hadoop QA commented on YARN-5969:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 38m 32s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 58m 59s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-5969 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12844728/YARN-5969.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux f3225c8a2434 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 
21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / ea54752 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/14476/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/14476/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/14476/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> FairShareComparator getResourceUsage poor performance
> 

[jira] [Commented] (YARN-4465) SchedulerUtils#validateRequest for Label check should happen only when nodelabel enabled

2016-12-27 Thread Ying Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15779921#comment-15779921
 ] 

Ying Zhang commented on YARN-4465:
--

Hi [~bibinchundatt], [~leftnoteasy], I was trying this fix on my build, and 
found that after disabling Node Labels, the RM down immediately after restart 
completes, due to the exception below which was thrown by the code added in 
this fix. Have you encountered this? My build is based on 2.7.3. It looks to me 
that we only need to do this check if it is not in recovery. Or have I missed 
some other fixes?
{code:title=SchedulerUtils.java|borderStyle=solid}
  public static void normalizeAndValidateRequest(ResourceRequest resReq,
  ... ...{
Configuration conf = rmContext.getYarnConfiguration();
// If Node label is not enabled throw exception
if (null != conf && !YarnConfiguration.areNodeLabelsEnabled(conf)) {
  String labelExp = resReq.getNodeLabelExpression();
  if (!(RMNodeLabelsManager.NO_LABEL.equals(labelExp)
  || null == labelExp)) {
throw new InvalidLabelResourceRequestException(
"Invalid resource request, node label not enabled "
+ "but request contains label expression");
  }
}
{code}

2016-12-26 23:37:25,844 FATAL resourcemanager.ResourceManager 
(ResourceManager.java:main(1189)) - Error starting ResourceManager
org.apache.hadoop.service.ServiceStateException: 
org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: Invalid 
resource request, node label not enabled but request contains label expression
at 
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:972)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1013)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1697)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1009)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1049)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1186)
Caused by: 
org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: Invalid 
resource request, node label not enabled but request contains label expression
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
... 10 more

> SchedulerUtils#validateRequest for Label check should happen only when 
> nodelabel enabled
> 
>
> Key: YARN-4465
> URL: https://issues.apache.org/jira/browse/YARN-4465
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4465.patch, 0002-YARN-4465.patch, 
> 0003-YARN-4465.patch, 0004-YARN-4465.patch, 0006-YARN-4465.patch, 
> 0007-YARN-4465.patch
>
>
> Disable label from rm side yarn.nodelabel.enable=false
> Capacity scheduler label configuration for queue is available as below
> default label for queue = b1 as 3 and accessible labels as 1,3
> Submit application to queue A .

[jira] [Commented] (YARN-5906) Update AppSchedulingInfo to use SchedulingPlacementSet

2016-12-27 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15779939#comment-15779939
 ] 

Sunil G commented on YARN-5906:
---

Generally patch looks fine for me. I will commit tomorrow if there are no 
objections.

> Update AppSchedulingInfo to use SchedulingPlacementSet
> --
>
> Key: YARN-5906
> URL: https://issues.apache.org/jira/browse/YARN-5906
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-5906.1.patch, YARN-5906.2.patch, YARN-5906.3.patch, 
> YARN-5906.4.patch
>
>
> Currently AppSchedulingInfo simply stores resource request and scheduler make 
> decision according to stored resource request. For example, CS/FS use 
> slightly different approach to get pending resource request and make delay 
> scheduling decision. 
> There're several benefits of moving pending resource request data structure 
> to SchedulingPlacementSet
> 1) Delay scheduling logic should be agnostic to scheduler, for example CS 
> supports count-based delay and FS supports both of count-based and time-based 
> delay. Ideally scheduler should be able to choose which delay scheduling 
> policy to use.
> 2) In addition to 1., YARN-4902 has proposal to support pluggable delay 
> scheduling behavior in addition to locality-based (host->rack->offswitch). 
> Which requires more flexibility.
> 3) To make YARN-4902 becomes real, instead of directly adding the new 
> resource request API to client, we can make scheduler to use it internally to 
> make sure it is well defined. And AppSchedulingInfo/SchedulingPlacementSet 
> will be the perfect place to isolate which ResourceRequest implementation to 
> use.
> 4) Different scheduling requirement needs different behavior of checking 
> ResourceRequest table.
> This JIRA is the 1st patch of several refactorings. Which moves all 
> ResourceRequest data structure and logics to SchedulingPlacementSet. We need 
> follow changes to make it better structured
> - Make delay scheduling to be a plugin of SchedulingPlacementSet
> - After YARN-4902 get committed, change SchedulingPlacementSet to use 
> YARN-4902 internally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4465) SchedulerUtils#validateRequest for Label check should happen only when nodelabel enabled

2016-12-27 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15779956#comment-15779956
 ] 

Sunil G commented on YARN-4465:
---

After RM restart, it seems apps are still sending resource requests with 
NodeLabel Expression in its resource requests.

> SchedulerUtils#validateRequest for Label check should happen only when 
> nodelabel enabled
> 
>
> Key: YARN-4465
> URL: https://issues.apache.org/jira/browse/YARN-4465
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4465.patch, 0002-YARN-4465.patch, 
> 0003-YARN-4465.patch, 0004-YARN-4465.patch, 0006-YARN-4465.patch, 
> 0007-YARN-4465.patch
>
>
> Disable label from rm side yarn.nodelabel.enable=false
> Capacity scheduler label configuration for queue is available as below
> default label for queue = b1 as 3 and accessible labels as 1,3
> Submit application to queue A .
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException):
>  Invalid resource request, queue=b1 doesn't have permission to access all 
> labels in resource request. labelExpression of resource request=3. Queue 
> labels=1,3
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:401)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:340)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:283)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:602)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:247)
> {noformat}
> # Ignore default label expression when label is disabled *or*
> # NormalizeResourceRequest we can set label expression to  
> when node label is not enabled *or*
> # Improve message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4465) SchedulerUtils#validateRequest for Label check should happen only when nodelabel enabled

2016-12-27 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15779965#comment-15779965
 ] 

Sunil G commented on YARN-4465:
---

I guess I missed one call more call flow. After recovering an app after RM 
restart, AM containers resource request will be validated. if that resource 
request is invalid due to the presence of  node label expression, then this 
scenario can occur. But that exception is considered as FATAL here. I think we 
need to retrospect whether we can skip that app or not. I feel we can make that 
APP failed after recovery. cc/[~rohithsharma] also.

> SchedulerUtils#validateRequest for Label check should happen only when 
> nodelabel enabled
> 
>
> Key: YARN-4465
> URL: https://issues.apache.org/jira/browse/YARN-4465
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4465.patch, 0002-YARN-4465.patch, 
> 0003-YARN-4465.patch, 0004-YARN-4465.patch, 0006-YARN-4465.patch, 
> 0007-YARN-4465.patch
>
>
> Disable label from rm side yarn.nodelabel.enable=false
> Capacity scheduler label configuration for queue is available as below
> default label for queue = b1 as 3 and accessible labels as 1,3
> Submit application to queue A .
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException):
>  Invalid resource request, queue=b1 doesn't have permission to access all 
> labels in resource request. labelExpression of resource request=3. Queue 
> labels=1,3
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:401)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:340)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:283)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:602)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:247)
> {noformat}
> # Ignore default label expression when label is disabled *or*
> # NormalizeResourceRequest we can set label expression to  
> when node label is not enabled *or*
> # Improve message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6024) Capacity Scheduler continuous reservation looking doesn't work when queue's used+reserved = max

2016-12-27 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15779989#comment-15779989
 ] 

Sunil G commented on YARN-6024:
---

+1 for branch-2.7 patch. If others does not have any difference of opinion, I 
will commit core patch in 2.7 and test case in trunk/branch-2.

> Capacity Scheduler continuous reservation looking doesn't work when queue's 
> used+reserved = max
> ---
>
> Key: YARN-6024
> URL: https://issues.apache.org/jira/browse/YARN-6024
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6024-branch-2.7.001.patch, 
> YARN-6024-branch-2.7.001.patch, YARN-6024.001.patch
>
>
> Found one corner case when continuous reservation looking doesn't work:
> When queue's used=max, the queue's capacity check fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4465) SchedulerUtils#validateRequest for Label check should happen only when nodelabel enabled

2016-12-27 Thread Ying Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780019#comment-15780019
 ] 

Ying Zhang commented on YARN-4465:
--

Hi [~sunilg], can we just skip the check if it is in recovery, i.e., by adding 
a "if (!isRecovery)"?

> SchedulerUtils#validateRequest for Label check should happen only when 
> nodelabel enabled
> 
>
> Key: YARN-4465
> URL: https://issues.apache.org/jira/browse/YARN-4465
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4465.patch, 0002-YARN-4465.patch, 
> 0003-YARN-4465.patch, 0004-YARN-4465.patch, 0006-YARN-4465.patch, 
> 0007-YARN-4465.patch
>
>
> Disable label from rm side yarn.nodelabel.enable=false
> Capacity scheduler label configuration for queue is available as below
> default label for queue = b1 as 3 and accessible labels as 1,3
> Submit application to queue A .
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException):
>  Invalid resource request, queue=b1 doesn't have permission to access all 
> labels in resource request. labelExpression of resource request=3. Queue 
> labels=1,3
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:401)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:340)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:283)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:602)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:247)
> {noformat}
> # Ignore default label expression when label is disabled *or*
> # NormalizeResourceRequest we can set label expression to  
> when node label is not enabled *or*
> # Improve message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6028) Add document for container metrics

2016-12-27 Thread Weiwei Yang (JIRA)
Weiwei Yang created YARN-6028:
-

 Summary: Add document for container metrics
 Key: YARN-6028
 URL: https://issues.apache.org/jira/browse/YARN-6028
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation, nodemanager
Affects Versions: 2.7.3
Reporter: Weiwei Yang
Assignee: Weiwei Yang


YARN-3022 exposed container metrics from node manager, but document is missing 
in Metrics.md.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5931) Document timeout interfaces CLI and REST APIs

2016-12-27 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-5931:

Attachment: YARN-5931.2.patch

updated patch fixing review comments.. 

> Document timeout interfaces CLI and REST APIs
> -
>
> Key: YARN-5931
> URL: https://issues.apache.org/jira/browse/YARN-5931
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: ResourceManagerRest.html, YARN-5931.0.patch, 
> YARN-5931.1.patch, YARN-5931.2.patch, YarnCommands.html
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4465) SchedulerUtils#validateRequest for Label check should happen only when nodelabel enabled

2016-12-27 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780135#comment-15780135
 ] 

Sunil G commented on YARN-4465:
---

Recovery of the app has to fail as well. But other apps recovery should go 
through and RM has to come up. If we are keeping these two as the expected 
outcome, i think a new jira could be raised and we can continue discussing the 
same there. 

One more question though? I know its a negative scenario. But node labels were 
disabled manually before RM restart, correct?

> SchedulerUtils#validateRequest for Label check should happen only when 
> nodelabel enabled
> 
>
> Key: YARN-4465
> URL: https://issues.apache.org/jira/browse/YARN-4465
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4465.patch, 0002-YARN-4465.patch, 
> 0003-YARN-4465.patch, 0004-YARN-4465.patch, 0006-YARN-4465.patch, 
> 0007-YARN-4465.patch
>
>
> Disable label from rm side yarn.nodelabel.enable=false
> Capacity scheduler label configuration for queue is available as below
> default label for queue = b1 as 3 and accessible labels as 1,3
> Submit application to queue A .
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException):
>  Invalid resource request, queue=b1 doesn't have permission to access all 
> labels in resource request. labelExpression of resource request=3. Queue 
> labels=1,3
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:401)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:340)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:283)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:602)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:247)
> {noformat}
> # Ignore default label expression when label is disabled *or*
> # NormalizeResourceRequest we can set label expression to  
> when node label is not enabled *or*
> # Improve message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2663) Race condintion in shared cache CleanerTask during deletion of resource

2016-12-27 Thread Zhaofei Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780152#comment-15780152
 ] 

Zhaofei Meng commented on YARN-2663:


Cleaner task rm hdfs resource after rm scm cache.Uploader task uploader add scm 
cache after upload hdfs resource.
Add lock for cleaner task and nm uploader task to controll sequence of rm scm 
cache and hdfs resource.

> Race condintion in shared cache CleanerTask during deletion of resource
> ---
>
> Key: YARN-2663
> URL: https://issues.apache.org/jira/browse/YARN-2663
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Priority: Blocker
>
> In CleanerTask, store.removeResource(key) and 
> removeResourceFromCacheFileSystem(path) do not happen together in atomic 
> fashion.
> Since resources could be uploaded with different file names, the SCM could 
> receive a notification to add a resource to the SCM between the two 
> operations. Thus, we have a scenario where the cleaner service deletes the 
> entry from the scm, receives a notification from the uploader (adding the 
> entry back into the scm) and then deletes the file from HDFS.
> Cleaner code that deletes resource:
> {code}
>   if (store.isResourceEvictable(key, resource)) {
> try {
>   /*
>* TODO: There is a race condition between store.removeResource(key)
>* and removeResourceFromCacheFileSystem(path) operations because 
> they
>* do not happen atomically and resources can be uploaded with
>* different file names by the node managers.
>*/
>   // remove the resource from scm (checks for appIds as well)
>   if (store.removeResource(key)) {
> // remove the resource from the file system
> boolean deleted = removeResourceFromCacheFileSystem(path);
> if (deleted) {
>   resourceStatus = ResourceStatus.DELETED;
> } else {
>   LOG.error("Failed to remove path from the file system."
>   + " Skipping this resource: " + path);
>   resourceStatus = ResourceStatus.ERROR;
> }
>   } else {
> // we did not delete the resource because it contained application
> // ids
> resourceStatus = ResourceStatus.PROCESSED;
>   }
> } catch (IOException e) {
>   LOG.error(
>   "Failed to remove path from the file system. Skipping this 
> resource: "
>   + path, e);
>   resourceStatus = ResourceStatus.ERROR;
> }
>   } else {
> resourceStatus = ResourceStatus.PROCESSED;
>   }
> {code}
> Uploader code that uploads resource:
> {code}
>   // create the temporary file
>   tempPath = new Path(directoryPath, getTemporaryFileName(actualPath));
>   if (!uploadFile(actualPath, tempPath)) {
> LOG.warn("Could not copy the file to the shared cache at " + 
> tempPath);
> return false;
>   }
>   // set the permission so that it is readable but not writable
>   // TODO should I create the file with the right permission so I save the
>   // permission call?
>   fs.setPermission(tempPath, FILE_PERMISSION);
>   // rename it to the final filename
>   Path finalPath = new Path(directoryPath, actualPath.getName());
>   if (!fs.rename(tempPath, finalPath)) {
> LOG.warn("The file already exists under " + finalPath +
> ". Ignoring this attempt.");
> deleteTempFile(tempPath);
> return false;
>   }
>   // notify the SCM
>   if (!notifySharedCacheManager(checksumVal, actualPath.getName())) {
> // the shared cache manager rejected the upload (as it is likely
> // uploaded under a different name
> // clean up this file and exit
> fs.delete(finalPath, false);
> return false;
>   }
> {code}
> One solution is to have the UploaderService always rename the resource file 
> to the checksum of the resource plus the extension. With this fix we will 
> never receive a notify for the resource before the delete from the FS has 
> happened because the rename on the node manager will fail. If the node 
> manager uploads the file after it is deleted from the FS, we are ok and the 
> resource will simply get added back to the scm once a notification is 
> received.
> The classpath at the MapReduce layer is still usable because we leverage 
> links to preserve the original client file name.
> The downside is that now the shared cache files in HDFS are less readable. 
> This could be mitigated with an added admin command to the SCM that gives a 
> list of filenames associated with a checksum or vice versa.



--
This message was sent by Atl

[jira] [Created] (YARN-6029) CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to release a reserved co

2016-12-27 Thread Tao Yang (JIRA)
Tao Yang created YARN-6029:
--

 Summary: CapacityScheduler deadlock when 
ParentQueue#getQueueUserAclInfo is called by Thread_A at the moment that 
Thread_B calls LeafQueue#assignContainers to release a reserved container
 Key: YARN-6029
 URL: https://issues.apache.org/jira/browse/YARN-6029
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.8.0
Reporter: Tao Yang
Assignee: Tao Yang


When ParentQueue#getQueueUserAclInfo is called (e.g. a client calls 
YarnClient#getQueueAclsInfo) just at the moment that LeafQueue#assignContainers 
is called and before notifying parent queue to release resource (should release 
a reserved container), then ResourceManager can deadlock. I found this problem 
on our testing environment for hadoop2.8.

Reproduce the deadlock in chronological order
* 1. Thread A (ResourceManager Event Processor) calls synchronized 
LeafQueue#assignContainers (got LeafQueue instance lock of queue root.a)
* 2. Thread B (IPC Server handler) calls synchronized 
ParentQueue#getQueueUserAclInfo (got ParentQueue instance lock of queue root), 
iterates over children queue acls and is blocked when calling synchronized 
LeafQueue#getQueueUserAclInfo (the LeafQueue instance lock of queue root.a is 
hold by Thread A)
* 3. Thread A wants to inform the parent queue that a container is being 
completed and is blocked when invoking synchronized 
ParentQueue#internalReleaseResource method (the ParentQueue instance lock of 
queue root is hold by Thread B)

I think the synchronized modifier of LeafQueue#getQueueUserAclInfo can be 
removed to solve this problem, since this method appears to not affect fields 
of LeafQueue instance.

Attach patch with UT for review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6029) CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to release a reserved co

2016-12-27 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-6029:
---
Attachment: YARN-6029.001.patch
deadlock.jstack

> CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by 
> Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to 
> release a reserved container
> --
>
> Key: YARN-6029
> URL: https://issues.apache.org/jira/browse/YARN-6029
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.8.0
>Reporter: Tao Yang
>Assignee: Tao Yang
> Attachments: YARN-6029.001.patch, deadlock.jstack
>
>
> When ParentQueue#getQueueUserAclInfo is called (e.g. a client calls 
> YarnClient#getQueueAclsInfo) just at the moment that 
> LeafQueue#assignContainers is called and before notifying parent queue to 
> release resource (should release a reserved container), then ResourceManager 
> can deadlock. I found this problem on our testing environment for hadoop2.8.
> Reproduce the deadlock in chronological order
> * 1. Thread A (ResourceManager Event Processor) calls synchronized 
> LeafQueue#assignContainers (got LeafQueue instance lock of queue root.a)
> * 2. Thread B (IPC Server handler) calls synchronized 
> ParentQueue#getQueueUserAclInfo (got ParentQueue instance lock of queue 
> root), iterates over children queue acls and is blocked when calling 
> synchronized LeafQueue#getQueueUserAclInfo (the LeafQueue instance lock of 
> queue root.a is hold by Thread A)
> * 3. Thread A wants to inform the parent queue that a container is being 
> completed and is blocked when invoking synchronized 
> ParentQueue#internalReleaseResource method (the ParentQueue instance lock of 
> queue root is hold by Thread B)
> I think the synchronized modifier of LeafQueue#getQueueUserAclInfo can be 
> removed to solve this problem, since this method appears to not affect fields 
> of LeafQueue instance.
> Attach patch with UT for review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6029) CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to release a reserved co

2016-12-27 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-6029:

Priority: Blocker  (was: Major)

> CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by 
> Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to 
> release a reserved container
> --
>
> Key: YARN-6029
> URL: https://issues.apache.org/jira/browse/YARN-6029
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.8.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Blocker
> Attachments: YARN-6029.001.patch, deadlock.jstack
>
>
> When ParentQueue#getQueueUserAclInfo is called (e.g. a client calls 
> YarnClient#getQueueAclsInfo) just at the moment that 
> LeafQueue#assignContainers is called and before notifying parent queue to 
> release resource (should release a reserved container), then ResourceManager 
> can deadlock. I found this problem on our testing environment for hadoop2.8.
> Reproduce the deadlock in chronological order
> * 1. Thread A (ResourceManager Event Processor) calls synchronized 
> LeafQueue#assignContainers (got LeafQueue instance lock of queue root.a)
> * 2. Thread B (IPC Server handler) calls synchronized 
> ParentQueue#getQueueUserAclInfo (got ParentQueue instance lock of queue 
> root), iterates over children queue acls and is blocked when calling 
> synchronized LeafQueue#getQueueUserAclInfo (the LeafQueue instance lock of 
> queue root.a is hold by Thread A)
> * 3. Thread A wants to inform the parent queue that a container is being 
> completed and is blocked when invoking synchronized 
> ParentQueue#internalReleaseResource method (the ParentQueue instance lock of 
> queue root is hold by Thread B)
> I think the synchronized modifier of LeafQueue#getQueueUserAclInfo can be 
> removed to solve this problem, since this method appears to not affect fields 
> of LeafQueue instance.
> Attach patch with UT for review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5931) Document timeout interfaces CLI and REST APIs

2016-12-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780289#comment-15780289
 ] 

Hadoop QA commented on YARN-5931:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
54s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
49s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m 
53s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 54s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 1 new + 206 unchanged - 0 fixed = 207 total (was 206) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 3 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 12 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
34s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
26s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 38m 43s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
14s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{c

[jira] [Commented] (YARN-5931) Document timeout interfaces CLI and REST APIs

2016-12-27 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780534#comment-15780534
 ] 

Daniel Templeton commented on YARN-5931:


A few more comments:

* "The possible combination of values" should just be "The possible values"
* "Timeout is configured and Application is RUNNING." should be "Timeout is 
configured and application is RUNNING."
* "When you run a GET operation on this resource, you can obtain a collection 
of Application Objects." should be "When you run a GET operation on this 
resource, a collection of Application Objects is returned."  Yeah, it's passive 
voice, but this isn't a 9th grade book report. :)
* "When you run a GET operation on this resource, you can obtain a collection 
of Application Timeout Objects." should be "When you run a GET operation on 
this resource, a collection of Application Timeout Objects is returned."
* "Each timeout object represents timeout type" should be "Each timeout object 
is composed of a timeout type"
* "Time at which application will get expired" should be "Time at which the 
application will expire"
* In general, I disagree with the earlier comment that the "the"s should be 
dropped in the docs.  It's fine in the javadocs, but the user docs should use 
"the application", i.e. they shouldn't be in developer speak.
* With "Valid values are the members of the ApplicationTimeoutType enum: 
LIFETIME", I'd be more explicit instead of using developer speak.  Something 
like, "Valid values are the members of the ApplicationTimeoutType enum. 
LIFETIME is currently the only valid value."
* "Update timeout of an application from the time of request in seconds." is 
confusing.  How about "Update application timeout (from the time of request) in 
seconds."

On a more meta level, 3s is fine with me.  1s would even be fine.

> Document timeout interfaces CLI and REST APIs
> -
>
> Key: YARN-5931
> URL: https://issues.apache.org/jira/browse/YARN-5931
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: ResourceManagerRest.html, YARN-5931.0.patch, 
> YARN-5931.1.patch, YARN-5931.2.patch, YarnCommands.html
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4423) Cleanup lint warnings in resource mananger

2016-12-27 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780600#comment-15780600
 ] 

Daniel Templeton commented on YARN-4423:


That's what happens when it sits for a year without a review. :)  I'll rebase 
this week.

> Cleanup lint warnings in resource mananger
> --
>
> Key: YARN-4423
> URL: https://issues.apache.org/jira/browse/YARN-4423
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Minor
>  Labels: oct16-easy
> Attachments: YARN-4423.001.patch, YARN-4423.002.patch, 
> YARN-4423.003.patch
>
>
> There are multiple lint warnings about unchecked usage.  This JIRA is to 
> clean them up, and maybe a few other quibbles as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6020) Resource.add exceed Int boundary,when compute queue demand in FairScheduler

2016-12-27 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780686#comment-15780686
 ] 

Daniel Templeton commented on YARN-6020:


It means that the git apply of your patch didn't succeed.  Did you perhaps 
create the patch against a branch instead of trunk?

> Resource.add exceed Int boundary,when compute queue demand in FairScheduler
> ---
>
> Key: YARN-6020
> URL: https://issues.apache.org/jira/browse/YARN-6020
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.4.0, 3.0.0-alpha1
>Reporter: Feng Yuan
>Assignee: Feng Yuan
>Priority: Critical
> Attachments: YARN-6020.001.patch
>
>
> Resource object use int to store memory field.When int exceed 
> Integer,MAX_VALUE,you will get a
> neagtive num of demand in FSQueue#updateDemand(),and your queue will nerver 
> get assign forever.
> I find this scenario still now in lasted code 3.0.x.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5991) Yarn Distributed Shell does not print throwable t to App Master When failed to start container

2016-12-27 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton reassigned YARN-5991:
--

Assignee: Daniel Templeton

> Yarn Distributed Shell does not print throwable t to App Master When failed 
> to start container
> --
>
> Key: YARN-5991
> URL: https://issues.apache.org/jira/browse/YARN-5991
> Project: Hadoop YARN
>  Issue Type: Improvement
> Environment: apache hadoop 2.7.1, centos 6.5
>Reporter: dashwang
>Assignee: Daniel Templeton
>Priority: Minor
>  Labels: newbie
>
> 16/12/12 16:27:20 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
> START_CONTAINER for Container container_1481517162158_0027_01_03
> 16/12/12 16:27:20 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
> START_CONTAINER for Container container_1481517162158_0027_01_04
> 16/12/12 16:27:20 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
> START_CONTAINER for Container container_1481517162158_0027_01_02
> 16/12/12 16:27:20 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> slave02:22710
> 16/12/12 16:27:20 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> slave01:34140
> 16/12/12 16:27:20 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> master:52037
> 16/12/12 16:27:20 ERROR launcher.ApplicationMaster: Failed to start Container 
> container_1481517162158_0027_01_02



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5991) Yarn Distributed Shell does not print throwable t to App Master When failed to start container

2016-12-27 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15741409#comment-15741409
 ] 

Daniel Templeton edited comment on YARN-5991 at 12/27/16 5:00 PM:
--

The source code of ApplicationMaster.java is:

{code}
@Override
public void onStartContainerError(ContainerId containerId, Throwable t) {
  LOG.error("Failed to start Container " + containerId);
  containers.remove(containerId);
  applicationMaster.numCompletedContainers.incrementAndGet();
  applicationMaster.numFailedContainers.incrementAndGet();
}
{code}

that did not tell us the real reason for start container error!


was (Author: dashwang):
The source code of ApplicationMaster.java is:

@Override
public void onStartContainerError(ContainerId containerId, Throwable t) {
  LOG.error("Failed to start Container " + containerId);
  containers.remove(containerId);
  applicationMaster.numCompletedContainers.incrementAndGet();
  applicationMaster.numFailedContainers.incrementAndGet();
}


that did not tell us the real reason for start container error!

> Yarn Distributed Shell does not print throwable t to App Master When failed 
> to start container
> --
>
> Key: YARN-5991
> URL: https://issues.apache.org/jira/browse/YARN-5991
> Project: Hadoop YARN
>  Issue Type: Improvement
> Environment: apache hadoop 2.7.1, centos 6.5
>Reporter: dashwang
>Priority: Minor
>  Labels: newbie
>
> 16/12/12 16:27:20 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
> START_CONTAINER for Container container_1481517162158_0027_01_03
> 16/12/12 16:27:20 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
> START_CONTAINER for Container container_1481517162158_0027_01_04
> 16/12/12 16:27:20 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
> START_CONTAINER for Container container_1481517162158_0027_01_02
> 16/12/12 16:27:20 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> slave02:22710
> 16/12/12 16:27:20 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> slave01:34140
> 16/12/12 16:27:20 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> master:52037
> 16/12/12 16:27:20 ERROR launcher.ApplicationMaster: Failed to start Container 
> container_1481517162158_0027_01_02



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6021) When your allocated minShare of all queue`s added up exceed cluster capacity you can get some queue for 0 fairshare

2016-12-27 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780867#comment-15780867
 ] 

Karthik Kambatla commented on YARN-6021:


Excuse the long-winded response.

I believe minshare was originally introduced to handle a queue's *urgent* 
requirement on a *saturated* cluster:
# When preemption is enabled, mishare worth of resources are preempted from 
other queues. This was necessary because fairshare preemption was very rigid. 
Since then, we have augmented fairshare preemption with a threshold and timeout 
giving more control to the admins. I would encourage trying these new controls 
out instead of using minshare preemption. 
# When preemption is not enabled, setting minshare for a queue forcibly sets 
the fairshare of the queue to at least that value. Using minshare makes sense 
only when used for special cases. In a cluster where most queues have a 
minshare set, there is no more *fairness*. 

Also, minshare is an absolute value and needs to be updated as the cluster 
grows/shrinks. For these reasons, I would discourage the use of minshare. At 
Cloudera, we discourage our customers too. There are exceptions: a 
high-priority, latency-sensitive workload that needs at least {{x}} resources 
to start. 

In your example, I think either minshares are being abused or the cluster is 
too small. If all the queues require at least those many resources to be 
functional, clearly the cluster cannot accommodate all of them coming together. 

PS: If backward compatibility were not important, I would have advocated for 
removing minshare altogether. 

> When your allocated minShare of all queue`s added up exceed cluster capacity 
> you can get some queue for 0 fairshare
> ---
>
> Key: YARN-6021
> URL: https://issues.apache.org/jira/browse/YARN-6021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.4.0
>Reporter: Feng Yuan
>Assignee: Feng Yuan
>Priority: Critical
>
> In fair-scheduler.xml,If you config the minshare add up exceed parentQueue`s 
> fairshare,for root`s childs,fairshare is cluster capacity.
> You will found your R value look like below when compute childs fairshares:
> 1.0 
> 0.5 
> 0.25 
> 0.125 
> 0.0625 
> 0.03125 
> 0.015625 
> 0.0078125 
> 0.00390625
> I find this is due to:
> double rMax = 1.0;
> while (resourceUsedWithWeightToResourceRatio(rMax, schedulables, type)
> < totalResource) {
>   rMax *= 2.0;
>   }
> because resourceUsedWithWeightToResourceRatio will add minShare together.
> As i think is really should we bring in minShare when compute fairshare?
> My advice is we just consider weight is enough,and minshare's guarantee
> will get fulfill when assginContainer!
> Hope suggestion!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5899) A small fix for printing debug info inside function canAssignToThisQueue()

2016-12-27 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780909#comment-15780909
 ] 

Sunil G commented on YARN-5899:
---

Thanks for the patch. I ll take a look today. 

> A small fix for printing debug info inside function canAssignToThisQueue()
> --
>
> Key: YARN-5899
> URL: https://issues.apache.org/jira/browse/YARN-5899
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.0.0-alpha1
>Reporter: Ying Zhang
>Assignee: Ying Zhang
>Priority: Trivial
> Attachments: YARN-5899.001.patch
>
>
> A small fix inside function canAssignToThisQueue() for printing DEBUG info. 
> Please see patch attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6012) Remove node label (removeFromClusterNodeLabels) document is missing

2016-12-27 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780911#comment-15780911
 ] 

Sunil G commented on YARN-6012:
---

Could you please upload a patch. I can help to review.

> Remove node label (removeFromClusterNodeLabels) document is missing
> ---
>
> Key: YARN-6012
> URL: https://issues.apache.org/jira/browse/YARN-6012
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.7.2
>Reporter: Weiwei Yang
>Assignee: Ying Zhang
>  Labels: doc, nodelabel
>
> Add corresponding documentation for
> {code}
> yarn rmadmin -removeFromClusterNodeLabels "x,y"
> {code}
> in yarn node labels doc page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3955) Support for priority ACLs in CapacityScheduler

2016-12-27 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780979#comment-15780979
 ] 

Sunil G commented on YARN-3955:
---

java doc error is not related. Its a known failure from hadoop-azure project in 
trunk.
ASF warnings are shown in hadoop-tools, and am not so sure why its showing this 
jenkins run.

Test case failure s related due to missing config items. I will upload a patch 
for that.

> Support for priority ACLs in CapacityScheduler
> --
>
> Key: YARN-3955
> URL: https://issues.apache.org/jira/browse/YARN-3955
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: ApplicationPriority-ACL.pdf, 
> ApplicationPriority-ACLs-v2.pdf, YARN-3955.0001.patch, YARN-3955.0002.patch, 
> YARN-3955.0003.patch, YARN-3955.0004.patch, YARN-3955.0005.patch, 
> YARN-3955.0006.patch, YARN-3955.v0.patch, YARN-3955.v1.patch, 
> YARN-3955.wip1.patch
>
>
> Support will be added for User-level access permission to use different 
> application-priorities. This is to avoid situations where all users try 
> running max priority in the cluster and thus degrading the value of 
> priorities.
> Access Control Lists can be set per priority level within each queue. Below 
> is an example configuration that can be added in capacity scheduler 
> configuration
> file for each Queue level.
> yarn.scheduler.capacity.root...acl=user1,user2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4465) SchedulerUtils#validateRequest for Label check should happen only when nodelabel enabled

2016-12-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780983#comment-15780983
 ] 

Wangda Tan commented on YARN-4465:
--

Nice catch, thanks [~Ying Zhang]! 

I think to solve the problem, we need to swallow the InvalidResourceRequest 
exception when recovering, and fail the application instead of fail the RM.

Could you file a ticket for this, we can help with reviews.


> SchedulerUtils#validateRequest for Label check should happen only when 
> nodelabel enabled
> 
>
> Key: YARN-4465
> URL: https://issues.apache.org/jira/browse/YARN-4465
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4465.patch, 0002-YARN-4465.patch, 
> 0003-YARN-4465.patch, 0004-YARN-4465.patch, 0006-YARN-4465.patch, 
> 0007-YARN-4465.patch
>
>
> Disable label from rm side yarn.nodelabel.enable=false
> Capacity scheduler label configuration for queue is available as below
> default label for queue = b1 as 3 and accessible labels as 1,3
> Submit application to queue A .
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException):
>  Invalid resource request, queue=b1 doesn't have permission to access all 
> labels in resource request. labelExpression of resource request=3. Queue 
> labels=1,3
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:401)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:340)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:283)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:602)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:247)
> {noformat}
> # Ignore default label expression when label is disabled *or*
> # NormalizeResourceRequest we can set label expression to  
> when node label is not enabled *or*
> # Improve message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5969) FairShareComparator getResourceUsage poor performance

2016-12-27 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780987#comment-15780987
 ] 

Yufei Gu commented on YARN-5969:


Thanks [~zsl2007]'s new patch. LGTM. +1(non-binding).  Would any committer take 
a look? cc [~kasha], [~templedf].

> FairShareComparator getResourceUsage poor performance
> -
>
> Key: YARN-5969
> URL: https://issues.apache.org/jira/browse/YARN-5969
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: zhangshilong
>Assignee: zhangshilong
> Attachments: 20161206.patch, 20161222.patch, YARN-5969.patch, 
> apprunning_after.png, apprunning_before.png, 
> containerAllocatedDelta_before.png, containerAllocated_after.png, 
> pending_after.png, pending_before.png
>
>
> in FairShareComparator class, the performance of function getResourceUsage()  
> is very poor. It will be executed above 100,000,000 times per second.
> In our scene, It  takes 20 seconds per minute.  
> A simple solution is to reduce call counts  of the function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6024) Capacity Scheduler continuous reservation looking doesn't work when queue's used+reserved = max

2016-12-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781013#comment-15781013
 ] 

Wangda Tan commented on YARN-6024:
--

Thanks for review, [~sunilg] / [~Ying Zhang]. 

[~Ying Zhang], I may not fully understand what you suggested, if it is just a 
code style change, I would prefer not to do it, since branch-2.7 is under 
maintenance state, keep changes to branch-2.7 simple and straightforward is 
more important. Make sense? 

> Capacity Scheduler continuous reservation looking doesn't work when queue's 
> used+reserved = max
> ---
>
> Key: YARN-6024
> URL: https://issues.apache.org/jira/browse/YARN-6024
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6024-branch-2.7.001.patch, 
> YARN-6024-branch-2.7.001.patch, YARN-6024.001.patch
>
>
> Found one corner case when continuous reservation looking doesn't work:
> When queue's used=max, the queue's capacity check fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5709) Cleanup leader election configs and pluggability

2016-12-27 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5709:
--
Attachment: yarn-5709-branch-2.8.02.patch

> Cleanup leader election configs and pluggability
> 
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: yarn-5709-branch-2.8.01.patch, 
> yarn-5709-branch-2.8.02.patch, yarn-5709-branch-2.8.patch, 
> yarn-5709-wip.2.patch, yarn-5709.1.patch, yarn-5709.2.patch, 
> yarn-5709.3.patch, yarn-5709.4.patch
>
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6025) Few issues in synchronization in CapacityScheduler & AbstractYarnScheduler

2016-12-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781024#comment-15781024
 ] 

Wangda Tan commented on YARN-6025:
--

bq. But just one more doubt
In my mind, methods in AYS is majorly for better code reuse, schedulers can use 
treat it more like libraries. And locking approach can be varied for different 
schedulers, this is the safest/most flexible way to me. 

> Few issues in synchronization in CapacityScheduler & AbstractYarnScheduler
> --
>
> Key: YARN-6025
> URL: https://issues.apache.org/jira/browse/YARN-6025
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, scheduler
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-6025.01.patch
>
>
> YARN-3139 does optimization on the locks by introducing 
> ReentrantReadWriteLock to remove synchronized but seems to have some issues.
> # CapacityScheduler
> #* nodeUpdate(RMNode) need not be synchronized, as its the only one to be in 
> the class
> #* setLastNodeUpdateTime in nodeUpdate needs to be updated with readLock ? 
> then getLastNodeUpdateTime is done without any lock and more over its 
> volatile.
> #* getUserGroupMappingPlacementRule need not be public as its held called 
> within and not used in test and further is called from initScheduler and 
> reinitialize where both are holding write locks so i presume getting read 
> locks are of no use.
> # AbstractYarnScheduler
> #* recoverContainersOnNode is synchronized as well as holds write lock on the 
> complete method so i presume we do not require synchronized here.
> #* nodeUpdate method too is synchronized but if i see the updates done inside 
> i do not see any place where node update from two different nodes will have 
> any issues (except for schedulerHealth which is taken care internally with 
> concurrentHashMap), And even if require we could better use write lock. (also 
> depends on the decision of next point)
> #* readLock is only used in containerLaunchedOnNode which i am not completely 
> sure whether its required to have a read lock here, suppose we do not require 
> then whether there is any use of read write locks in AbstractYarnScheduler as 
> in general there is performance overhead in using readwrite lock over 
> synchronized blocks on frequently accessed code path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6029) CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to release a reserved

2016-12-27 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781054#comment-15781054
 ] 

Naganarasimha G R commented on YARN-6029:
-

Thanks for working on the patch [~Tao Yang], Actually whats happening is 
inversion of the lock order, in one flow we are holding the lock of the Leaf 
and trying to get the lock of the parent (completedContainer flow) and in the 
other we are holding the lock of the Parent and then trying to get the lock of 
the Leaf (getQueueUserAclInfo flow) . Better solution to this would be to have 
as per the 2.9/trunk where in read locks are introduced such that 
*getQueueUserAclInfo* uses read lock and *completedContainer* uses write lock. 
so that we can avoid the this inversion. This would be a big change but i am 
not completely sure about your fix as you are only removing from the LeafQueue

> CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by 
> Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to 
> release a reserved container
> --
>
> Key: YARN-6029
> URL: https://issues.apache.org/jira/browse/YARN-6029
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.8.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Blocker
> Attachments: YARN-6029.001.patch, deadlock.jstack
>
>
> When ParentQueue#getQueueUserAclInfo is called (e.g. a client calls 
> YarnClient#getQueueAclsInfo) just at the moment that 
> LeafQueue#assignContainers is called and before notifying parent queue to 
> release resource (should release a reserved container), then ResourceManager 
> can deadlock. I found this problem on our testing environment for hadoop2.8.
> Reproduce the deadlock in chronological order
> * 1. Thread A (ResourceManager Event Processor) calls synchronized 
> LeafQueue#assignContainers (got LeafQueue instance lock of queue root.a)
> * 2. Thread B (IPC Server handler) calls synchronized 
> ParentQueue#getQueueUserAclInfo (got ParentQueue instance lock of queue 
> root), iterates over children queue acls and is blocked when calling 
> synchronized LeafQueue#getQueueUserAclInfo (the LeafQueue instance lock of 
> queue root.a is hold by Thread A)
> * 3. Thread A wants to inform the parent queue that a container is being 
> completed and is blocked when invoking synchronized 
> ParentQueue#internalReleaseResource method (the ParentQueue instance lock of 
> queue root is hold by Thread B)
> I think the synchronized modifier of LeafQueue#getQueueUserAclInfo can be 
> removed to solve this problem, since this method appears to not affect fields 
> of LeafQueue instance.
> Attach patch with UT for review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6025) Few issues in synchronization in CapacityScheduler & AbstractYarnScheduler

2016-12-27 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781057#comment-15781057
 ] 

Naganarasimha G R commented on YARN-6025:
-

Ok in that case will consider only removing of synchronization for nodeUpdate 
and keep it specific to FIFO scheduler as part of this jira.

> Few issues in synchronization in CapacityScheduler & AbstractYarnScheduler
> --
>
> Key: YARN-6025
> URL: https://issues.apache.org/jira/browse/YARN-6025
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, scheduler
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-6025.01.patch
>
>
> YARN-3139 does optimization on the locks by introducing 
> ReentrantReadWriteLock to remove synchronized but seems to have some issues.
> # CapacityScheduler
> #* nodeUpdate(RMNode) need not be synchronized, as its the only one to be in 
> the class
> #* setLastNodeUpdateTime in nodeUpdate needs to be updated with readLock ? 
> then getLastNodeUpdateTime is done without any lock and more over its 
> volatile.
> #* getUserGroupMappingPlacementRule need not be public as its held called 
> within and not used in test and further is called from initScheduler and 
> reinitialize where both are holding write locks so i presume getting read 
> locks are of no use.
> # AbstractYarnScheduler
> #* recoverContainersOnNode is synchronized as well as holds write lock on the 
> complete method so i presume we do not require synchronized here.
> #* nodeUpdate method too is synchronized but if i see the updates done inside 
> i do not see any place where node update from two different nodes will have 
> any issues (except for schedulerHealth which is taken care internally with 
> concurrentHashMap), And even if require we could better use write lock. (also 
> depends on the decision of next point)
> #* readLock is only used in containerLaunchedOnNode which i am not completely 
> sure whether its required to have a read lock here, suppose we do not require 
> then whether there is any use of read write locks in AbstractYarnScheduler as 
> in general there is performance overhead in using readwrite lock over 
> synchronized blocks on frequently accessed code path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5969) FairShareComparator getResourceUsage poor performance

2016-12-27 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781174#comment-15781174
 ] 

Karthik Kambatla commented on YARN-5969:


+1. Checking this in. 

> FairShareComparator getResourceUsage poor performance
> -
>
> Key: YARN-5969
> URL: https://issues.apache.org/jira/browse/YARN-5969
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: zhangshilong
>Assignee: zhangshilong
> Attachments: 20161206.patch, 20161222.patch, YARN-5969.patch, 
> apprunning_after.png, apprunning_before.png, 
> containerAllocatedDelta_before.png, containerAllocated_after.png, 
> pending_after.png, pending_before.png
>
>
> in FairShareComparator class, the performance of function getResourceUsage()  
> is very poor. It will be executed above 100,000,000 times per second.
> In our scene, It  takes 20 seconds per minute.  
> A simple solution is to reduce call counts  of the function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5969) FairShareComparator: Cache value of getResourceUsage for better performance

2016-12-27 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-5969:
---
Summary: FairShareComparator: Cache value of getResourceUsage for better 
performance  (was: FairShareComparator getResourceUsage poor performance)

> FairShareComparator: Cache value of getResourceUsage for better performance
> ---
>
> Key: YARN-5969
> URL: https://issues.apache.org/jira/browse/YARN-5969
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: zhangshilong
>Assignee: zhangshilong
> Attachments: 20161206.patch, 20161222.patch, YARN-5969.patch, 
> apprunning_after.png, apprunning_before.png, 
> containerAllocatedDelta_before.png, containerAllocated_after.png, 
> pending_after.png, pending_before.png
>
>
> in FairShareComparator class, the performance of function getResourceUsage()  
> is very poor. It will be executed above 100,000,000 times per second.
> In our scene, It  takes 20 seconds per minute.  
> A simple solution is to reduce call counts  of the function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5529) Create new DiskValidator class with metrics

2016-12-27 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-5529:
---
Attachment: YARN-5529.003.patch

Thanks [~rkanter]'s review. I've uploaded the new patch for all your comments.

> Create new DiskValidator class with metrics
> ---
>
> Key: YARN-5529
> URL: https://issues.apache.org/jira/browse/YARN-5529
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Ray Chiang
>Assignee: Yufei Gu
>  Labels: supportability
> Attachments: YARN-5529.001.patch, YARN-5529.002.patch, 
> YARN-5529.003.patch
>
>
> With really large clusters, the basic DiskValidator isn't sufficient for some 
> of the less common types of disk failures.
> Look at a new DiskValidator that could do one or more of the following:
> - Add new tests to find more problems
> - Add new metrics to at least characterize problems that we haven't predicted



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5938) Refactoring OpportunisticContainerAllocator use SchedulerRequestKey instead of Priority

2016-12-27 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781216#comment-15781216
 ] 

Arun Suresh commented on YARN-5938:
---

The failed test case runs locally and the checkstyle warning fix would require 
further refactoring of the {{ApplicationMasterService::allocateInternal()}} 
method, which should probably tackled as a separate patch.

Committing this shortly

> Refactoring OpportunisticContainerAllocator use SchedulerRequestKey instead 
> of Priority
> ---
>
> Key: YARN-5938
> URL: https://issues.apache.org/jira/browse/YARN-5938
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-5938-YARN-5085.001.patch, 
> YARN-5938-YARN-5085.002.patch, YARN-5938-YARN-5085.003.patch, 
> YARN-5938-YARN-5085.004.patch, YARN-5938.001.patch, YARN-5938.002.patch, 
> YARN-5938.003.patch
>
>
> Minor code re-organization to do the following:
> # The OpportunisticContainerAllocatorAMService currently allocates outside 
> the ApplicationAttempt lock maintained by the ApplicationMasterService. This 
> should happen inside the lock.
> # Refactored out some code to simplify the allocate() method.
> # Removed some unused fields inside the OpportunisticContainerAllocator.
> # Re-organized some of the code in the 
> OpportunisticContainerAllocatorAMService::allocate method to make it a bit 
> more readable.
> # Moved SchedulerRequestKey to a new package, so it can be used by the 
> OpportunisticContainerAllocator/Context.
> # Moved all usages of Priority in the OpportunisticContainerAllocator -> 
> SchedulerRequestKey. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5938) Refactoring OpportunisticContainerAllocator to use SchedulerRequestKey instead of Priority and other misc fixes

2016-12-27 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-5938:
--
Summary: Refactoring OpportunisticContainerAllocator to use 
SchedulerRequestKey instead of Priority and other misc fixes  (was: Refactoring 
OpportunisticContainerAllocator use SchedulerRequestKey instead of Priority)

> Refactoring OpportunisticContainerAllocator to use SchedulerRequestKey 
> instead of Priority and other misc fixes
> ---
>
> Key: YARN-5938
> URL: https://issues.apache.org/jira/browse/YARN-5938
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Fix For: 3.0.0-alpha2
>
> Attachments: YARN-5938-YARN-5085.001.patch, 
> YARN-5938-YARN-5085.002.patch, YARN-5938-YARN-5085.003.patch, 
> YARN-5938-YARN-5085.004.patch, YARN-5938.001.patch, YARN-5938.002.patch, 
> YARN-5938.003.patch
>
>
> Minor code re-organization to do the following:
> # The OpportunisticContainerAllocatorAMService currently allocates outside 
> the ApplicationAttempt lock maintained by the ApplicationMasterService. This 
> should happen inside the lock.
> # Refactored out some code to simplify the allocate() method.
> # Removed some unused fields inside the OpportunisticContainerAllocator.
> # Re-organized some of the code in the 
> OpportunisticContainerAllocatorAMService::allocate method to make it a bit 
> more readable.
> # Moved SchedulerRequestKey to a new package, so it can be used by the 
> OpportunisticContainerAllocator/Context.
> # Moved all usages of Priority in the OpportunisticContainerAllocator -> 
> SchedulerRequestKey. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6022) Revert changes of AbstractResourceRequest

2016-12-27 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781255#comment-15781255
 ] 

Junping Du commented on YARN-6022:
--

Remove 2.8 from target version given YARN-5774 was not actually in branch-2.8.

> Revert changes of AbstractResourceRequest
> -
>
> Key: YARN-6022
> URL: https://issues.apache.org/jira/browse/YARN-6022
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Priority: Blocker
>
> YARN-5774 added AbstractResourceRequest to make easier internal scheduler 
> change, this is not a correct approach: For example, with this change, we 
> need to make AbstractResourceRequest to be public/stable. And end users can 
> use it like:
> {code}
> AbstractResourceRequest request = ...
> request.setCapability(...)
> {code}
> But AbstractResourceRequest should not be visible by application at all. 
> We need to revert it from branch-2.8 / branch-2 / trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6022) Revert changes of AbstractResourceRequest

2016-12-27 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-6022:
-
Target Version/s: 2.9.0, 3.0.0-alpha2  (was: 2.8.0, 3.0.0-alpha2)

> Revert changes of AbstractResourceRequest
> -
>
> Key: YARN-6022
> URL: https://issues.apache.org/jira/browse/YARN-6022
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Priority: Blocker
>
> YARN-5774 added AbstractResourceRequest to make easier internal scheduler 
> change, this is not a correct approach: For example, with this change, we 
> need to make AbstractResourceRequest to be public/stable. And end users can 
> use it like:
> {code}
> AbstractResourceRequest request = ...
> request.setCapability(...)
> {code}
> But AbstractResourceRequest should not be visible by application at all. 
> We need to revert it from branch-2.8 / branch-2 / trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5529) Create new DiskValidator class with metrics

2016-12-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781271#comment-15781271
 ] 

Hadoop QA commented on YARN-5529:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
38s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 27s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch 
generated 2 new + 7 unchanged - 0 fixed = 9 total (was 7) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m  
2s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 51m 22s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-5529 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12844821/YARN-5529.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 84b2cf652c8c 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 
15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / c0e0ef2 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/14479/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-common.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/14479/testReport/ |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/14479/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Create new DiskValidator class with metrics
> ---
>
> Key: YARN-5529
> URL: https://issues.apache.org/jira/browse/YARN-5529
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Ray Chiang
>Assignee: Yufei Gu
> 

[jira] [Commented] (YARN-5938) Refactoring OpportunisticContainerAllocator to use SchedulerRequestKey instead of Priority and other misc fixes

2016-12-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781274#comment-15781274
 ] 

Hudson commented on YARN-5938:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11042 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11042/])
YARN-5938. Refactoring OpportunisticContainerAllocator to use (arun suresh: rev 
ac1e5d4f77e3b9df8dcacb0b1f72eecc27931eb8)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoAppAttempt.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/scheduler/SchedulerRequestKey.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/UpdateContainerRequest.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerApplicationAttempt.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestContinuousScheduling.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerReservedEvent.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/placement/SchedulingPlacementSet.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RemoteNode.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* (delete) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerRequestKey.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAppSchedulingInfo.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/scheduler/OpportunisticContainerAllocator.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/scheduler/DistributedScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/SchedulerContainer.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yar

[jira] [Commented] (YARN-5969) FairShareComparator: Cache value of getResourceUsage for better performance

2016-12-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781275#comment-15781275
 ] 

Hudson commented on YARN-5969:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11042 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11042/])
YARN-5969. FairShareComparator: Cache value of getResourceUsage for (kasha: rev 
c3973e7080bf71b57ace4a6adf4bb43f3c5d35b5)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java


> FairShareComparator: Cache value of getResourceUsage for better performance
> ---
>
> Key: YARN-5969
> URL: https://issues.apache.org/jira/browse/YARN-5969
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: zhangshilong
>Assignee: zhangshilong
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: 20161206.patch, 20161222.patch, YARN-5969.patch, 
> apprunning_after.png, apprunning_before.png, 
> containerAllocatedDelta_before.png, containerAllocated_after.png, 
> pending_after.png, pending_before.png
>
>
> in FairShareComparator class, the performance of function getResourceUsage()  
> is very poor. It will be executed above 100,000,000 times per second.
> In our scene, It  takes 20 seconds per minute.  
> A simple solution is to reduce call counts  of the function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5962) Spelling errors in logging and exceptions for resource manager code

2016-12-27 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781299#comment-15781299
 ] 

Robert Kanter commented on YARN-5962:
-

+1 I'll commit this momentarily.

> Spelling errors in logging and exceptions for resource manager code
> ---
>
> Key: YARN-5962
> URL: https://issues.apache.org/jira/browse/YARN-5962
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Grant Sohn
>Assignee: Grant Sohn
>Priority: Trivial
> Attachments: YARN-5962.1.patch, YARN-5962.2.patch, YARN-5962.3.patch
>
>
> Found spelling errors in exceptions and logging.
> Examples:
> Invailid -> Invalid
> refinition -> definition
> non-exsisting -> non-existing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4882) Change the log level to DEBUG for recovering completed applications

2016-12-27 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-4882:
---
Attachment: YARN-4882.004.patch

Fixed a typo.

> Change the log level to DEBUG for recovering completed applications
> ---
>
> Key: YARN-4882
> URL: https://issues.apache.org/jira/browse/YARN-4882
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Daniel Templeton
>  Labels: oct16-easy
> Attachments: YARN-4882.001.patch, YARN-4882.002.patch, 
> YARN-4882.003.patch, YARN-4882.004.patch
>
>
> I think for recovering completed applications no need to log as INFO, rather 
> it can be made it as DEBUG.  The problem seen from large cluster is if any 
> issue happens during RM start up and continuously switching , then  RM logs 
> are filled with most with recovering applications only. 
> There are 6 lines are logged for 1 applications as I shown in below logs, 
> then consider RM default value for max-completed applications is 10K. So for 
> each switch 10K*6=60K lines will be added which is not useful I feel.
> {noformat}
> 2016-03-01 10:20:59,077 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Default priority 
> level is set to application:application_1456298208485_21507
> 2016-03-01 10:20:59,094 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Recovering 
> app: application_1456298208485_21507 with 1 attempts and final state = 
> FINISHED
> 2016-03-01 10:20:59,100 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Recovering attempt: appattempt_1456298208485_21507_01 with final state: 
> FINISHED
> 2016-03-01 10:20:59,107 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1456298208485_21507_01 State change from NEW to FINISHED
> 2016-03-01 10:20:59,111 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
> application_1456298208485_21507 State change from NEW to FINISHED
> 2016-03-01 10:20:59,112 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=rohith   
> OPERATION=Application Finished - Succeeded  TARGET=RMAppManager 
> RESULT=SUCCESS  APPID=application_1456298208485_21507
> {noformat}
> The main problem is missing important information's from the logs before RM 
> unstable. Even though log roll back is 50 or 100, in a short period all these 
> logs will be rolled out and all the logs contains only RM switching 
> information that too recovering applications!!. 
> I suggest at least completed applications recovery should be logged as DEBUG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5709) Cleanup leader election configs and pluggability

2016-12-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781360#comment-15781360
 ] 

Hadoop QA commented on YARN-5709:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
31s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
33s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
20s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
37s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
35s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
45s{color} | {color:green} branch-2.8 passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
26s{color} | {color:red} hadoop-yarn-server-resourcemanager in branch-2.8 
failed with JDK v1.8.0_111. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_121 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
10s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
40s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
40s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 41s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 4 new + 321 unchanged - 9 fixed = 325 total (was 330) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
12s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
24s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_111. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
29s{color} | {color:green} hadoop-yarn-api in the patch passed with JDK 
v1.7.0_121. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 56s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_121. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}193m 49s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_111 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn

[jira] [Commented] (YARN-5709) Cleanup leader election configs and pluggability

2016-12-27 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781367#comment-15781367
 ] 

Jian He commented on YARN-5709:
---

I'm not sure which part of the patch is causing javadoc failure, [~templedf], 
[~kasha], any clue?

> Cleanup leader election configs and pluggability
> 
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: yarn-5709-branch-2.8.01.patch, 
> yarn-5709-branch-2.8.02.patch, yarn-5709-branch-2.8.patch, 
> yarn-5709-wip.2.patch, yarn-5709.1.patch, yarn-5709.2.patch, 
> yarn-5709.3.patch, yarn-5709.4.patch
>
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6030) Eliminate timelineServiceV2 boolean flag in TimelineClientImpl

2016-12-27 Thread Li Lu (JIRA)
Li Lu created YARN-6030:
---

 Summary: Eliminate timelineServiceV2 boolean flag in 
TimelineClientImpl
 Key: YARN-6030
 URL: https://issues.apache.org/jira/browse/YARN-6030
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: YARN-5355
Reporter: Li Lu
Priority: Minor


I just discovered that we're still using a boolean flag {{timelineServiceV2}} 
after we introduced {{timelineServiceVersion}}. This sounds a little bit 
error-pruning. After the discussion I think we should only use and trust 
{{timelineServiceVersion}}. {{timelineServiceV2}} is set upon client creation. 
Instead of creating a v2 client and set this flag, maybe we'd like to do some 
sanity check and make sure the creation call is consistent with the 
configuration? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5962) Spelling errors in logging and exceptions for resource manager code

2016-12-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781378#comment-15781378
 ] 

Hudson commented on YARN-5962:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11044 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11044/])
YARN-5962. Spelling errors in logging and exceptions for resource (rkanter: rev 
1bbd023275db535ab80fcb60e022151e9679d468)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryPlan.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationInputValidator.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestReservationInputValidator.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestContainerResourceUsage.java


> Spelling errors in logging and exceptions for resource manager code
> ---
>
> Key: YARN-5962
> URL: https://issues.apache.org/jira/browse/YARN-5962
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Grant Sohn
>Assignee: Grant Sohn
>Priority: Trivial
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: YARN-5962.1.patch, YARN-5962.2.patch, YARN-5962.3.patch
>
>
> Found spelling errors in exceptions and logging.
> Examples:
> Invailid -> Invalid
> refinition -> definition
> non-exsisting -> non-existing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5257) Fix unreleased resources and null dereferences

2016-12-27 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-5257:

Summary: Fix unreleased resources and null dereferences  (was: Fix bad 
practices)

> Fix unreleased resources and null dereferences
> --
>
> Key: YARN-5257
> URL: https://issues.apache.org/jira/browse/YARN-5257
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-5257.001.patch
>
>
> The following code contain potential problems:
> {code}
> Unreleased Resource: Streams  TopCLI.java:738
> Unreleased Resource: Streams  Graph.java:189
> Unreleased Resource: Streams  CgroupsLCEResourcesHandler.java:291
> Unreleased Resource: Streams  UnmanagedAMLauncher.java:195
> Unreleased Resource: Streams  CGroupsHandlerImpl.java:319
> Unreleased Resource: Streams  TrafficController.java:629
> Null Dereference  ApplicationImpl.java:465
> Null Dereference  VisualizeStateMachine.java:52
> Null Dereference  ContainerImpl.java:1089
> Null Dereference  QueueManager.java:219
> Null Dereference  QueueManager.java:232
> Null Dereference  ResourceLocalizationService.java:1016
> Null Dereference  ResourceLocalizationService.java:1023
> Null Dereference  ResourceLocalizationService.java:1040
> Null Dereference  ResourceLocalizationService.java:1052
> Null Dereference  ProcfsBasedProcessTree.java:802
> Null Dereference  TimelineClientImpl.java:639
> Null Dereference  LocalizedResource.java:206
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5257) Fix bad practices

2016-12-27 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781383#comment-15781383
 ] 

Robert Kanter commented on YARN-5257:
-

+1

> Fix bad practices
> -
>
> Key: YARN-5257
> URL: https://issues.apache.org/jira/browse/YARN-5257
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-5257.001.patch
>
>
> The following code contain potential problems:
> {code}
> Unreleased Resource: Streams  TopCLI.java:738
> Unreleased Resource: Streams  Graph.java:189
> Unreleased Resource: Streams  CgroupsLCEResourcesHandler.java:291
> Unreleased Resource: Streams  UnmanagedAMLauncher.java:195
> Unreleased Resource: Streams  CGroupsHandlerImpl.java:319
> Unreleased Resource: Streams  TrafficController.java:629
> Null Dereference  ApplicationImpl.java:465
> Null Dereference  VisualizeStateMachine.java:52
> Null Dereference  ContainerImpl.java:1089
> Null Dereference  QueueManager.java:219
> Null Dereference  QueueManager.java:232
> Null Dereference  ResourceLocalizationService.java:1016
> Null Dereference  ResourceLocalizationService.java:1023
> Null Dereference  ResourceLocalizationService.java:1040
> Null Dereference  ResourceLocalizationService.java:1052
> Null Dereference  ProcfsBasedProcessTree.java:802
> Null Dereference  TimelineClientImpl.java:639
> Null Dereference  LocalizedResource.java:206
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5257) Fix unreleased resources and null dereferences

2016-12-27 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-5257:

 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha2
   2.9.0

Thanks [~yufeigu].  Committed to branch-2 and trunk!

> Fix unreleased resources and null dereferences
> --
>
> Key: YARN-5257
> URL: https://issues.apache.org/jira/browse/YARN-5257
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: YARN-5257.001.patch
>
>
> The following code contain potential problems:
> {code}
> Unreleased Resource: Streams  TopCLI.java:738
> Unreleased Resource: Streams  Graph.java:189
> Unreleased Resource: Streams  CgroupsLCEResourcesHandler.java:291
> Unreleased Resource: Streams  UnmanagedAMLauncher.java:195
> Unreleased Resource: Streams  CGroupsHandlerImpl.java:319
> Unreleased Resource: Streams  TrafficController.java:629
> Null Dereference  ApplicationImpl.java:465
> Null Dereference  VisualizeStateMachine.java:52
> Null Dereference  ContainerImpl.java:1089
> Null Dereference  QueueManager.java:219
> Null Dereference  QueueManager.java:232
> Null Dereference  ResourceLocalizationService.java:1016
> Null Dereference  ResourceLocalizationService.java:1023
> Null Dereference  ResourceLocalizationService.java:1040
> Null Dereference  ResourceLocalizationService.java:1052
> Null Dereference  ProcfsBasedProcessTree.java:802
> Null Dereference  TimelineClientImpl.java:639
> Null Dereference  LocalizedResource.java:206
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6029) CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to release a reserved

2016-12-27 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781407#comment-15781407
 ] 

Junping Du commented on YARN-6029:
--

Thanks [~Tao Yang] for reporting the issue. I think this issue is valid given 
existing code flow and your jstack shows. For your current patch, I am a little 
concern that totally removing synchronized in getQueueUserAclInfo could cause 
other concurrent issues. 
However, [~Naganarasimha],  I don't quite understand your proposed solution 
here - if we do exactly the same change as trunk/branch-2.9, thread A 
(completedContainer flow) can hold write lock on queue of Root.A and pending on 
write lock on queue of Root, while thread B (getQueueUserAclInfo flow) may hold 
read lock on queue of Root and pending on read lock on queue of Root.A. Nothing 
becomes better. Do I miss anything here?

> CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by 
> Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to 
> release a reserved container
> --
>
> Key: YARN-6029
> URL: https://issues.apache.org/jira/browse/YARN-6029
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.8.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Blocker
> Attachments: YARN-6029.001.patch, deadlock.jstack
>
>
> When ParentQueue#getQueueUserAclInfo is called (e.g. a client calls 
> YarnClient#getQueueAclsInfo) just at the moment that 
> LeafQueue#assignContainers is called and before notifying parent queue to 
> release resource (should release a reserved container), then ResourceManager 
> can deadlock. I found this problem on our testing environment for hadoop2.8.
> Reproduce the deadlock in chronological order
> * 1. Thread A (ResourceManager Event Processor) calls synchronized 
> LeafQueue#assignContainers (got LeafQueue instance lock of queue root.a)
> * 2. Thread B (IPC Server handler) calls synchronized 
> ParentQueue#getQueueUserAclInfo (got ParentQueue instance lock of queue 
> root), iterates over children queue acls and is blocked when calling 
> synchronized LeafQueue#getQueueUserAclInfo (the LeafQueue instance lock of 
> queue root.a is hold by Thread A)
> * 3. Thread A wants to inform the parent queue that a container is being 
> completed and is blocked when invoking synchronized 
> ParentQueue#internalReleaseResource method (the ParentQueue instance lock of 
> queue root is hold by Thread B)
> I think the synchronized modifier of LeafQueue#getQueueUserAclInfo can be 
> removed to solve this problem, since this method appears to not affect fields 
> of LeafQueue instance.
> Attach patch with UT for review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2016-12-27 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-5995:
---
Assignee: zhangyubiao

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5992) revert the visibility of interface AllocationFileLoaderService.Listener to public for outside usage

2016-12-27 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton resolved YARN-5992.

Resolution: Duplicate

YARN-6000 already resolves the issue, even though this patch was posted first.  
Sorry about that.

> revert the visibility of interface AllocationFileLoaderService.Listener to 
> public for outside usage
> ---
>
> Key: YARN-5992
> URL: https://issues.apache.org/jira/browse/YARN-5992
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 3.0.0-alpha2
>Reporter: Pan Yuxuan
> Attachments: YARN-5992.patch
>
>
> The visibility of interface {{AllocationFileLoaderService.Listener}} is 
> changed from public to default by YARN-4997.
> This may cause some downsteam  like Hive with compile error:
> {noformat}
> Hive/src/shims/scheduler/src/main/java/org/apache/hadoop/hive/schshim/FairSchedulerShim.java:[45,67]
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.Listener
>  is not public in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService;
>  cannot be accessed from outside package
> {noformat}
> org/apache/hadoop/hive/schshim/FairSchedulerShim.java uses the interface like 
> below:
> {noformat}
> allocsLoader.setReloadListener(new AllocationFileLoaderService.Listener() {
>   @Override
>   public void onReload(AllocationConfiguration allocs) {
> allocConf.set(allocs);
>   }
> });
> {noformat}
> So can we revert the visibility of {{AllocationFileLoaderService.Listener}} 
> to public so that the downstream can use the interface as before, otherwise, 
> the downstream should make some changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5257) Fix unreleased resources and null dereferences

2016-12-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781454#comment-15781454
 ] 

Hudson commented on YARN-5257:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11045 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11045/])
YARN-5257. Fix unreleased resources and null dereferences (yufeigu via 
(rkanter: rev 9262797e86453fc04b7ca3710b73b21fcdf9e6b4)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TrafficController.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/TopCLI.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/main/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/UnmanagedAMLauncher.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/VisualizeStateMachine.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java


> Fix unreleased resources and null dereferences
> --
>
> Key: YARN-5257
> URL: https://issues.apache.org/jira/browse/YARN-5257
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: YARN-5257.001.patch
>
>
> The following code contain potential problems:
> {code}
> Unreleased Resource: Streams  TopCLI.java:738
> Unreleased Resource: Streams  Graph.java:189
> Unreleased Resource: Streams  CgroupsLCEResourcesHandler.java:291
> Unreleased Resource: Streams  UnmanagedAMLauncher.java:195
> Unreleased Resource: Streams  CGroupsHandlerImpl.java:319
> Unreleased Resource: Streams  TrafficController.java:629
> Null Dereference  ApplicationImpl.java:465
> Null Dereference  VisualizeStateMachine.java:52
> Null Dereference  ContainerImpl.java:1089
> Null Dereference  QueueManager.java:219
> Null Dereference  QueueManager.java:232
> Null Dereference  ResourceLocalizationService.java:1016
> Null Dereference  ResourceLocalizationService.java:1023
> Null Dereference  ResourceLocalizationService.java:1040
> Null Dereference  ResourceLocalizationService.java:1052
> Null Dereference  ProcfsBasedProcessTree.java:802
> Null Dereference  TimelineClientImpl.java:639
> Null Dereference  LocalizedResource.java:206
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5709) Cleanup leader election configs and pluggability

2016-12-27 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781465#comment-15781465
 ] 

Daniel Templeton commented on YARN-5709:


Hmmm...  Looks like {{ActiveStandbyElectorBasedElectorService.serviceStop()}} 
has a javadoc comment inside the method, which is wrong.  Building the javadoc 
with and without the patch, though, I see no difference in the warnings and 
errors.

Incidentally, the {{@SuppressWarnings(value = "unchecked")}} shouldn't be 
needed on {{ActiveStandbyElectorBasedElectorService. notifyFatalError()}} now 
that YARN-4457 is in.


> Cleanup leader election configs and pluggability
> 
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: yarn-5709-branch-2.8.01.patch, 
> yarn-5709-branch-2.8.02.patch, yarn-5709-branch-2.8.patch, 
> yarn-5709-wip.2.patch, yarn-5709.1.patch, yarn-5709.2.patch, 
> yarn-5709.3.patch, yarn-5709.4.patch
>
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4882) Change the log level to DEBUG for recovering completed applications

2016-12-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781489#comment-15781489
 ] 

Hadoop QA commented on YARN-4882:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 0 new + 252 unchanged - 2 fixed = 252 total (was 254) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 39m 35s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 61m 58s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-4882 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12844824/YARN-4882.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 2047ba30c913 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 
15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 1bbd023 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/14480/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/14480/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Buil

[jira] [Commented] (YARN-5709) Cleanup leader election configs and pluggability

2016-12-27 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781487#comment-15781487
 ] 

Jian He commented on YARN-5709:
---

Could you re-submit the patch with your change and retry ?

> Cleanup leader election configs and pluggability
> 
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: yarn-5709-branch-2.8.01.patch, 
> yarn-5709-branch-2.8.02.patch, yarn-5709-branch-2.8.patch, 
> yarn-5709-wip.2.patch, yarn-5709.1.patch, yarn-5709.2.patch, 
> yarn-5709.3.patch, yarn-5709.4.patch
>
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5658) YARN should have a hook to delete a path from HDFS when an application ends

2016-12-27 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781518#comment-15781518
 ] 

Jian He commented on YARN-5658:
---

[~templedf], not just HDFS, allowing deleting a path from ZK is also a required 
use-case for yarn-service-registry,  so the implementation should to be 
somewhat generic.
I think an option to clean a path is useful.  One approach in my mind is to 
leverage the getApplicationsToCleanup signal sent in the node heartbeat when 
the application finally completes, after which the NM where AM container ran 
could do the post cleanup.  The difference from YARN-2261 is that instead of 
running in a separate container, it could be run from NodeManager. And this 
approach does not require significant code change in application. YARN-2261 
could be used for more advanced use-cases which AM requires.  Problem with this 
approach is that if the NM crashes, the files may not get cleanup, even 
YARN-2261 has the same problem. For simplicity, may be we can allow this to 
occur and warn the user in the UI that the clean up is not done successfully 
and ask user do it manually.  thoughts?

> YARN should have a hook to delete a path from HDFS when an application ends
> ---
>
> Key: YARN-5658
> URL: https://issues.apache.org/jira/browse/YARN-5658
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>
> There are many cases when a client uploads data to HDFS and then needs to 
> subsequently clean it up, such as with the distributed cache.  It would be 
> helpful if YARN would do that cleanup automatically on job completion.
> The hook could be generic to an URI supported by {{FileSystem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4882) Change the log level to DEBUG for recovering completed applications

2016-12-27 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781529#comment-15781529
 ] 

Daniel Templeton commented on YARN-4882:


Test failures are unrelated, and the lack of tests is because it's only 
changing log messages.

> Change the log level to DEBUG for recovering completed applications
> ---
>
> Key: YARN-4882
> URL: https://issues.apache.org/jira/browse/YARN-4882
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Daniel Templeton
>  Labels: oct16-easy
> Attachments: YARN-4882.001.patch, YARN-4882.002.patch, 
> YARN-4882.003.patch, YARN-4882.004.patch
>
>
> I think for recovering completed applications no need to log as INFO, rather 
> it can be made it as DEBUG.  The problem seen from large cluster is if any 
> issue happens during RM start up and continuously switching , then  RM logs 
> are filled with most with recovering applications only. 
> There are 6 lines are logged for 1 applications as I shown in below logs, 
> then consider RM default value for max-completed applications is 10K. So for 
> each switch 10K*6=60K lines will be added which is not useful I feel.
> {noformat}
> 2016-03-01 10:20:59,077 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Default priority 
> level is set to application:application_1456298208485_21507
> 2016-03-01 10:20:59,094 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Recovering 
> app: application_1456298208485_21507 with 1 attempts and final state = 
> FINISHED
> 2016-03-01 10:20:59,100 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Recovering attempt: appattempt_1456298208485_21507_01 with final state: 
> FINISHED
> 2016-03-01 10:20:59,107 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1456298208485_21507_01 State change from NEW to FINISHED
> 2016-03-01 10:20:59,111 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
> application_1456298208485_21507 State change from NEW to FINISHED
> 2016-03-01 10:20:59,112 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=rohith   
> OPERATION=Application Finished - Succeeded  TARGET=RMAppManager 
> RESULT=SUCCESS  APPID=application_1456298208485_21507
> {noformat}
> The main problem is missing important information's from the logs before RM 
> unstable. Even though log roll back is 50 or 100, in a short period all these 
> logs will be rolled out and all the logs contains only RM switching 
> information that too recovering applications!!. 
> I suggest at least completed applications recovery should be logged as DEBUG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5831) Propagate allowPreemptionFrom flag all the way down to the app

2016-12-27 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781573#comment-15781573
 ] 

Yufei Gu commented on YARN-5831:


There is an assumption from the original code: if the parent queue is 
non-preemptable, the children must be non-preemptable.  My patch follow this 
assumption.
{code}
for (FSQueue q = getQueue();
!q.getQueueName().equals("root");
q = q.getParent()) {
  if (!q.isPreemptable()) {
return false;
  }
}
{code}

> Propagate allowPreemptionFrom flag all the way down to the app
> --
>
> Key: YARN-5831
> URL: https://issues.apache.org/jira/browse/YARN-5831
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Yufei Gu
>
> FairScheduler allows disallowing preemption from a queue. When checking if 
> preemption for an application is allowed, the new preemption code recurses 
> all the way to the root queue to check this flag. 
> Propagating this information all the way to the app will be more efficient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5831) Propagate allowPreemptionFrom flag all the way down to the app

2016-12-27 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781573#comment-15781573
 ] 

Yufei Gu edited comment on YARN-5831 at 12/27/16 11:59 PM:
---

There is an assumption in the original code: if the parent queue is 
non-preemptable, the children must be non-preemptable.  My patch follow this 
assumption.
{code}
for (FSQueue q = getQueue();
!q.getQueueName().equals("root");
q = q.getParent()) {
  if (!q.isPreemptable()) {
return false;
  }
}
{code}


was (Author: yufeigu):
There is an assumption from the original code: if the parent queue is 
non-preemptable, the children must be non-preemptable.  My patch follow this 
assumption.
{code}
for (FSQueue q = getQueue();
!q.getQueueName().equals("root");
q = q.getParent()) {
  if (!q.isPreemptable()) {
return false;
  }
}
{code}

> Propagate allowPreemptionFrom flag all the way down to the app
> --
>
> Key: YARN-5831
> URL: https://issues.apache.org/jira/browse/YARN-5831
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Yufei Gu
>
> FairScheduler allows disallowing preemption from a queue. When checking if 
> preemption for an application is allowed, the new preemption code recurses 
> all the way to the root queue to check this flag. 
> Propagating this information all the way to the app will be more efficient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4882) Change the log level to DEBUG for recovering completed applications

2016-12-27 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781579#comment-15781579
 ] 

Robert Kanter commented on YARN-4882:
-

+1

> Change the log level to DEBUG for recovering completed applications
> ---
>
> Key: YARN-4882
> URL: https://issues.apache.org/jira/browse/YARN-4882
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Daniel Templeton
>  Labels: oct16-easy
> Attachments: YARN-4882.001.patch, YARN-4882.002.patch, 
> YARN-4882.003.patch, YARN-4882.004.patch
>
>
> I think for recovering completed applications no need to log as INFO, rather 
> it can be made it as DEBUG.  The problem seen from large cluster is if any 
> issue happens during RM start up and continuously switching , then  RM logs 
> are filled with most with recovering applications only. 
> There are 6 lines are logged for 1 applications as I shown in below logs, 
> then consider RM default value for max-completed applications is 10K. So for 
> each switch 10K*6=60K lines will be added which is not useful I feel.
> {noformat}
> 2016-03-01 10:20:59,077 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Default priority 
> level is set to application:application_1456298208485_21507
> 2016-03-01 10:20:59,094 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Recovering 
> app: application_1456298208485_21507 with 1 attempts and final state = 
> FINISHED
> 2016-03-01 10:20:59,100 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Recovering attempt: appattempt_1456298208485_21507_01 with final state: 
> FINISHED
> 2016-03-01 10:20:59,107 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1456298208485_21507_01 State change from NEW to FINISHED
> 2016-03-01 10:20:59,111 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
> application_1456298208485_21507 State change from NEW to FINISHED
> 2016-03-01 10:20:59,112 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=rohith   
> OPERATION=Application Finished - Succeeded  TARGET=RMAppManager 
> RESULT=SUCCESS  APPID=application_1456298208485_21507
> {noformat}
> The main problem is missing important information's from the logs before RM 
> unstable. Even though log roll back is 50 or 100, in a short period all these 
> logs will be rolled out and all the logs contains only RM switching 
> information that too recovering applications!!. 
> I suggest at least completed applications recovery should be logged as DEBUG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (YARN-4882) Change the log level to DEBUG for recovering completed applications

2016-12-27 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-4882:

Comment: was deleted

(was: +1)

> Change the log level to DEBUG for recovering completed applications
> ---
>
> Key: YARN-4882
> URL: https://issues.apache.org/jira/browse/YARN-4882
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Daniel Templeton
>  Labels: oct16-easy
> Attachments: YARN-4882.001.patch, YARN-4882.002.patch, 
> YARN-4882.003.patch, YARN-4882.004.patch
>
>
> I think for recovering completed applications no need to log as INFO, rather 
> it can be made it as DEBUG.  The problem seen from large cluster is if any 
> issue happens during RM start up and continuously switching , then  RM logs 
> are filled with most with recovering applications only. 
> There are 6 lines are logged for 1 applications as I shown in below logs, 
> then consider RM default value for max-completed applications is 10K. So for 
> each switch 10K*6=60K lines will be added which is not useful I feel.
> {noformat}
> 2016-03-01 10:20:59,077 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Default priority 
> level is set to application:application_1456298208485_21507
> 2016-03-01 10:20:59,094 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Recovering 
> app: application_1456298208485_21507 with 1 attempts and final state = 
> FINISHED
> 2016-03-01 10:20:59,100 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Recovering attempt: appattempt_1456298208485_21507_01 with final state: 
> FINISHED
> 2016-03-01 10:20:59,107 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1456298208485_21507_01 State change from NEW to FINISHED
> 2016-03-01 10:20:59,111 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
> application_1456298208485_21507 State change from NEW to FINISHED
> 2016-03-01 10:20:59,112 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=rohith   
> OPERATION=Application Finished - Succeeded  TARGET=RMAppManager 
> RESULT=SUCCESS  APPID=application_1456298208485_21507
> {noformat}
> The main problem is missing important information's from the logs before RM 
> unstable. Even though log roll back is 50 or 100, in a short period all these 
> logs will be rolled out and all the logs contains only RM switching 
> information that too recovering applications!!. 
> I suggest at least completed applications recovery should be logged as DEBUG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6029) CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to release a reserved

2016-12-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781593#comment-15781593
 ] 

Wangda Tan commented on YARN-6029:
--

Thanks [~Tao Yang] for reporting this issue.

[~Naganarasimha], branch-2/trunk solves the problem after YARN-5706. 

However to fix the issue, backporting of YARN-5706 needs huge effort. I don't 
think it is even a plan. 

We can make some changes to LeafQueue:

1. Remove synchronized lock of assignContainers
2. Make changes:

{code}
# BEGINNING of LeafQueue#assignContainers
synchronized {
   // do stuffs
}

call-complete-containers (which locks parent) 

synchronized {
   // do rest stuffs
}
# END of LeafQueue#assignContainers
{code}

Removing synchronized will cause data inconsistency issue when fetch, and 
there're some other possible methods with the same pattern need change as well. 
(Grab LeafQueue lock while holding ParentQueue lock and do not grab 
CapacityScheduler's lock). 

> CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by 
> Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to 
> release a reserved container
> --
>
> Key: YARN-6029
> URL: https://issues.apache.org/jira/browse/YARN-6029
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.8.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Blocker
> Attachments: YARN-6029.001.patch, deadlock.jstack
>
>
> When ParentQueue#getQueueUserAclInfo is called (e.g. a client calls 
> YarnClient#getQueueAclsInfo) just at the moment that 
> LeafQueue#assignContainers is called and before notifying parent queue to 
> release resource (should release a reserved container), then ResourceManager 
> can deadlock. I found this problem on our testing environment for hadoop2.8.
> Reproduce the deadlock in chronological order
> * 1. Thread A (ResourceManager Event Processor) calls synchronized 
> LeafQueue#assignContainers (got LeafQueue instance lock of queue root.a)
> * 2. Thread B (IPC Server handler) calls synchronized 
> ParentQueue#getQueueUserAclInfo (got ParentQueue instance lock of queue 
> root), iterates over children queue acls and is blocked when calling 
> synchronized LeafQueue#getQueueUserAclInfo (the LeafQueue instance lock of 
> queue root.a is hold by Thread A)
> * 3. Thread A wants to inform the parent queue that a container is being 
> completed and is blocked when invoking synchronized 
> ParentQueue#internalReleaseResource method (the ParentQueue instance lock of 
> queue root is hold by Thread B)
> I think the synchronized modifier of LeafQueue#getQueueUserAclInfo can be 
> removed to solve this problem, since this method appears to not affect fields 
> of LeafQueue instance.
> Attach patch with UT for review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5831) Propagate allowPreemptionFrom flag all the way down to the app

2016-12-27 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-5831:
---
Attachment: YARN-5831.001.patch

The logic is covered by 
{{TestFairSchedulerPreemption.testNoPreemptionFromDisallowedQueue}}. No new 
unit test is needed.

> Propagate allowPreemptionFrom flag all the way down to the app
> --
>
> Key: YARN-5831
> URL: https://issues.apache.org/jira/browse/YARN-5831
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Yufei Gu
> Attachments: YARN-5831.001.patch
>
>
> FairScheduler allows disallowing preemption from a queue. When checking if 
> preemption for an application is allowed, the new preemption code recurses 
> all the way to the root queue to check this flag. 
> Propagating this information all the way to the app will be more efficient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5831) Propagate allowPreemptionFrom flag all the way down to the app

2016-12-27 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-5831:
---
Attachment: YARN-5831.002.patch

Found another unnecessary recursion in function {{updatePreemptionVariables}}. 
Uploaded the patch 002 to solve it. 

> Propagate allowPreemptionFrom flag all the way down to the app
> --
>
> Key: YARN-5831
> URL: https://issues.apache.org/jira/browse/YARN-5831
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Yufei Gu
> Attachments: YARN-5831.001.patch, YARN-5831.002.patch
>
>
> FairScheduler allows disallowing preemption from a queue. When checking if 
> preemption for an application is allowed, the new preemption code recurses 
> all the way to the root queue to check this flag. 
> Propagating this information all the way to the app will be more efficient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6029) CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to release a reserved

2016-12-27 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781644#comment-15781644
 ] 

Naganarasimha G R commented on YARN-6029:
-

Thanks [~djp] & [~wangda], for correcting me, missed to realize earlier that 
write lock needs to wait till all read read locks are finished.
But [~wangda] agree your solution solves the problem but current flow is 
{{CapacityScheduler.allocateContainersToNode \-> LeafQueue.assignContainers 
(hold the lock on leaf) \-> LeafQueue.handleExcessReservedContainer \-> 
LeafQueue.completedContainer \-> ParentQueue.completedContainer  (try to get 
the lock here)}}
Agree that we need to fix in this flow but simpler temporary correction in 
*ParentQueue* (assuming that 2.9/ trunk avoids the issue) could be 
{code}
@Override
  public List getQueueUserAclInfo(
  UserGroupInformation user) {
List userAcls = new ArrayList();
synchronized (this) {
  // Add parent queue acls
  userAcls.add(getUserAclInfo(user));
}
// Add children queue acls
for (CSQueue child : childQueues) {
  userAcls.addAll(child.getQueueUserAclInfo(user));
}

return userAcls;
  }
{code}

Thoughts ?

> CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by 
> Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to 
> release a reserved container
> --
>
> Key: YARN-6029
> URL: https://issues.apache.org/jira/browse/YARN-6029
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.8.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Blocker
> Attachments: YARN-6029.001.patch, deadlock.jstack
>
>
> When ParentQueue#getQueueUserAclInfo is called (e.g. a client calls 
> YarnClient#getQueueAclsInfo) just at the moment that 
> LeafQueue#assignContainers is called and before notifying parent queue to 
> release resource (should release a reserved container), then ResourceManager 
> can deadlock. I found this problem on our testing environment for hadoop2.8.
> Reproduce the deadlock in chronological order
> * 1. Thread A (ResourceManager Event Processor) calls synchronized 
> LeafQueue#assignContainers (got LeafQueue instance lock of queue root.a)
> * 2. Thread B (IPC Server handler) calls synchronized 
> ParentQueue#getQueueUserAclInfo (got ParentQueue instance lock of queue 
> root), iterates over children queue acls and is blocked when calling 
> synchronized LeafQueue#getQueueUserAclInfo (the LeafQueue instance lock of 
> queue root.a is hold by Thread A)
> * 3. Thread A wants to inform the parent queue that a container is being 
> completed and is blocked when invoking synchronized 
> ParentQueue#internalReleaseResource method (the ParentQueue instance lock of 
> queue root is hold by Thread B)
> I think the synchronized modifier of LeafQueue#getQueueUserAclInfo can be 
> removed to solve this problem, since this method appears to not affect fields 
> of LeafQueue instance.
> Attach patch with UT for review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5831) Propagate allowPreemptionFrom flag all the way down to the app

2016-12-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781718#comment-15781718
 ] 

Hadoop QA commented on YARN-5831:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 38m 
37s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 59m 16s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-5831 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12844840/YARN-5831.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 2c198f25f906 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 
21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 9262797 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/14481/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/14481/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Propagate allowPreemptionFrom flag all the way down to the app
> --
>
> Key: YARN-5831
> URL: https://issues.apache.org/jira/browse/YARN-5831
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kamba

[jira] [Commented] (YARN-5831) Propagate allowPreemptionFrom flag all the way down to the app

2016-12-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781770#comment-15781770
 ] 

Hadoop QA commented on YARN-5831:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 42m  6s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 67m 10s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestQueueManagerRealScheduler 
|
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-5831 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12844841/YARN-5831.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 5e15f3a4932d 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 9262797 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/14482/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/14482/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/14482/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was autom

[jira] [Updated] (YARN-5830) Avoid preempting AM containers

2016-12-27 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-5830:
---
Description: While considering containers for preemption, avoid AM 
containers unless absolutely necessary.   (was: While considering containers 
for preemption, avoid AM containers unless they are the only container for the 
app. )

> Avoid preempting AM containers
> --
>
> Key: YARN-5830
> URL: https://issues.apache.org/jira/browse/YARN-5830
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Yufei Gu
> Attachments: YARN-5830.001.patch
>
>
> While considering containers for preemption, avoid AM containers unless 
> absolutely necessary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4465) SchedulerUtils#validateRequest for Label check should happen only when nodelabel enabled

2016-12-27 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781914#comment-15781914
 ] 

Bibin A Chundatt commented on YARN-4465:


Thank you [~Ying Zhang] 
Apologies for missing  out the recover case with enable to disable case.. 
As [~sunilg]/[~leftnoteasy] mentioned please do file a jira for the same 

> SchedulerUtils#validateRequest for Label check should happen only when 
> nodelabel enabled
> 
>
> Key: YARN-4465
> URL: https://issues.apache.org/jira/browse/YARN-4465
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4465.patch, 0002-YARN-4465.patch, 
> 0003-YARN-4465.patch, 0004-YARN-4465.patch, 0006-YARN-4465.patch, 
> 0007-YARN-4465.patch
>
>
> Disable label from rm side yarn.nodelabel.enable=false
> Capacity scheduler label configuration for queue is available as below
> default label for queue = b1 as 3 and accessible labels as 1,3
> Submit application to queue A .
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException):
>  Invalid resource request, queue=b1 doesn't have permission to access all 
> labels in resource request. labelExpression of resource request=3. Queue 
> labels=1,3
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:401)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:340)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:283)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:602)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:247)
> {noformat}
> # Ignore default label expression when label is disabled *or*
> # NormalizeResourceRequest we can set label expression to  
> when node label is not enabled *or*
> # Improve message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-4465) SchedulerUtils#validateRequest for Label check should happen only when nodelabel enabled

2016-12-27 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781914#comment-15781914
 ] 

Bibin A Chundatt edited comment on YARN-4465 at 12/28/16 3:20 AM:
--

Thank you [~Ying Zhang] 
Apologies for missing  out the recover case with enable to disable scenario.. 
As [~sunilg]/[~leftnoteasy] mentioned please do file a jira for the same 


was (Author: bibinchundatt):
Thank you [~Ying Zhang] 
Apologies for missing  out the recover case with enable to disable case.. 
As [~sunilg]/[~leftnoteasy] mentioned please do file a jira for the same 

> SchedulerUtils#validateRequest for Label check should happen only when 
> nodelabel enabled
> 
>
> Key: YARN-4465
> URL: https://issues.apache.org/jira/browse/YARN-4465
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4465.patch, 0002-YARN-4465.patch, 
> 0003-YARN-4465.patch, 0004-YARN-4465.patch, 0006-YARN-4465.patch, 
> 0007-YARN-4465.patch
>
>
> Disable label from rm side yarn.nodelabel.enable=false
> Capacity scheduler label configuration for queue is available as below
> default label for queue = b1 as 3 and accessible labels as 1,3
> Submit application to queue A .
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException):
>  Invalid resource request, queue=b1 doesn't have permission to access all 
> labels in resource request. labelExpression of resource request=3. Queue 
> labels=1,3
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:401)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:340)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:283)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:602)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:247)
> {noformat}
> # Ignore default label expression when label is disabled *or*
> # NormalizeResourceRequest we can set label expression to  
> when node label is not enabled *or*
> # Improve message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6029) CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to release a reserved

2016-12-27 Thread Tao Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781969#comment-15781969
 ] 

Tao Yang commented on YARN-6029:


Thanks [~Naganarasimha] [~djp] [~leftnoteasy] for your suggestions. 
[~Naganarasimha] I think there maybe have a problem when iterating childQueues 
and at the same time ParentQueue#setChildQueues is called.
[~leftnoteasy] I agree your solution solves the problem. But I still think 
synchronized modifier of LeafQueue#getQueueUserAclInfo is not required. In my 
opinion, This method doesn't affect the data structure of LeafQueue instance 
(check permissions of the given user, create new QueueUserACLInfo instance then 
return.), and it's only called by ParentQueue#getQueueUserAclInfo. By the way, 
take FairScheduler as a reference, FSLeafQueue#getQueueUserAclInfo is not 
synchronized.
Maybe I haven't realized the potential problem, Please correct me if I am wrong.

> CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by 
> Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to 
> release a reserved container
> --
>
> Key: YARN-6029
> URL: https://issues.apache.org/jira/browse/YARN-6029
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.8.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Blocker
> Attachments: YARN-6029.001.patch, deadlock.jstack
>
>
> When ParentQueue#getQueueUserAclInfo is called (e.g. a client calls 
> YarnClient#getQueueAclsInfo) just at the moment that 
> LeafQueue#assignContainers is called and before notifying parent queue to 
> release resource (should release a reserved container), then ResourceManager 
> can deadlock. I found this problem on our testing environment for hadoop2.8.
> Reproduce the deadlock in chronological order
> * 1. Thread A (ResourceManager Event Processor) calls synchronized 
> LeafQueue#assignContainers (got LeafQueue instance lock of queue root.a)
> * 2. Thread B (IPC Server handler) calls synchronized 
> ParentQueue#getQueueUserAclInfo (got ParentQueue instance lock of queue 
> root), iterates over children queue acls and is blocked when calling 
> synchronized LeafQueue#getQueueUserAclInfo (the LeafQueue instance lock of 
> queue root.a is hold by Thread A)
> * 3. Thread A wants to inform the parent queue that a container is being 
> completed and is blocked when invoking synchronized 
> ParentQueue#internalReleaseResource method (the ParentQueue instance lock of 
> queue root is hold by Thread B)
> I think the synchronized modifier of LeafQueue#getQueueUserAclInfo can be 
> removed to solve this problem, since this method appears to not affect fields 
> of LeafQueue instance.
> Attach patch with UT for review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6027) Support fromId for flows/flowrun apps

2016-12-27 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781976#comment-15781976
 ] 

Varun Saxena commented on YARN-6027:


[~sunilg]
bq. However if same flow is ran multiple times in same day, they are also 
listed separately. 
They should not be returned separately if flow has been run by same user. There 
will be multiple runs but activity will only be one per day. The row key for 
flow activity table is cluster!dayTs!user!flow. Can you check what was the ID 
returned from backend ? Is server and client time zone different ?

bq. Assume a user is running same applications with different flow names (yes, 
its may not be a direct use case), then we are going to have multiple entries 
of flows. 
Yes. We identify flow by YARN tags given by the user. So if he gives different 
tags for same flow, it will be treated differently.

bq. User wants to see flow activities ran on a specific day (same day range 
from 10.00am to 11.30am ) in past
Not supported and cannot be supported as well with current schema. The minimum 
time unit for which records can be retrieved is a day. Let me know if there is 
a use case for above.

bq. User wants to search for flow activity with some given criteria. 
(name/duration etc)
This is what I suggested in an earlier comment. That we can probably support 
searching based on flow name as well. Currently not supported but will be 
feasible to support, even though not straight forward.

> Support fromId for flows/flowrun apps
> -
>
> Key: YARN-6027
> URL: https://issues.apache.org/jira/browse/YARN-6027
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>  Labels: yarn-5355-merge-blocker
>
> In YARN-5585 , fromId is supported for retrieving entities. We need similar 
> filter for flows/flowRun apps and flow run and flow as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5830) Avoid preempting AM containers

2016-12-27 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781991#comment-15781991
 ] 

Karthik Kambatla commented on YARN-5830:


[~yufeigu], thanks for working on this. 

The patch seems to introduce potentialContainers to track a potential set of 
containers that includes an AM container as we consider multiple nodes. But, I 
am a little confused by how we handle if we encounter an AM container. Can you 
explain your high-level approach? 

I haven't looked closely, but other minor comments:
# In the same method, I seemed to have forgotten a TODO (KK) item. Mind 
creating a follow-up jira and annotating the TODO with that jira instead of my 
initials? Thanks.
# Instead of introducing a new list (containersAMLast), can we just use 
containersToCheck? May be, remove AM containers first and then add them all at 
the end? 
# I think I saw a couple of typos too. 

> Avoid preempting AM containers
> --
>
> Key: YARN-5830
> URL: https://issues.apache.org/jira/browse/YARN-5830
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Yufei Gu
> Attachments: YARN-5830.001.patch
>
>
> While considering containers for preemption, avoid AM containers unless 
> absolutely necessary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6029) CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to release a reserved

2016-12-27 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782007#comment-15782007
 ] 

Li Lu commented on YARN-6029:
-

I'm not a scheduler expert, but "not affecting any data structure" sounds like 
a wrong reason to not to synchronize. [~wangda] will there be any potential 
data races according to Java memory model[1]? If not we can safely remove those 
synchronize keywords. Otherwise we have to stick to it no matter how appealing 
it appears to be. 

[1]:  http://www.cs.umd.edu/~pugh/java/memoryModel/

> CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by 
> Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to 
> release a reserved container
> --
>
> Key: YARN-6029
> URL: https://issues.apache.org/jira/browse/YARN-6029
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.8.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Blocker
> Attachments: YARN-6029.001.patch, deadlock.jstack
>
>
> When ParentQueue#getQueueUserAclInfo is called (e.g. a client calls 
> YarnClient#getQueueAclsInfo) just at the moment that 
> LeafQueue#assignContainers is called and before notifying parent queue to 
> release resource (should release a reserved container), then ResourceManager 
> can deadlock. I found this problem on our testing environment for hadoop2.8.
> Reproduce the deadlock in chronological order
> * 1. Thread A (ResourceManager Event Processor) calls synchronized 
> LeafQueue#assignContainers (got LeafQueue instance lock of queue root.a)
> * 2. Thread B (IPC Server handler) calls synchronized 
> ParentQueue#getQueueUserAclInfo (got ParentQueue instance lock of queue 
> root), iterates over children queue acls and is blocked when calling 
> synchronized LeafQueue#getQueueUserAclInfo (the LeafQueue instance lock of 
> queue root.a is hold by Thread A)
> * 3. Thread A wants to inform the parent queue that a container is being 
> completed and is blocked when invoking synchronized 
> ParentQueue#internalReleaseResource method (the ParentQueue instance lock of 
> queue root is hold by Thread B)
> I think the synchronized modifier of LeafQueue#getQueueUserAclInfo can be 
> removed to solve this problem, since this method appears to not affect fields 
> of LeafQueue instance.
> Attach patch with UT for review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-220) NM should limit number of applications who's logs are being aggregated

2016-12-27 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782062#comment-15782062
 ] 

Wilfred Spiegelenburg commented on YARN-220:


Should this be marked as fixed now that we have YARN-4697 (limit to the 
threadpool for uploads) and YARN-4766 (do not upload files older than the 
retention policy). It does not solve the case of falling behind but at least we 
have limits on what we upload now.

> NM should limit number of applications who's logs are being aggregated
> --
>
> Key: YARN-220
> URL: https://issues.apache.org/jira/browse/YARN-220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 0.23.4
>Reporter: Robert Joseph Evans
>
> The NodeManager should limit the number of applications that have their logs 
> being aggregated in parallel.  This will reduce the load on the NN.  We need 
> to ensure that the RM will continue to renew the token while this is 
> happening.  We also should look if the NM starts to fall behind if it can 
> delete some of the logs or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6029) CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to release a reserved

2016-12-27 Thread Tao Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782067#comment-15782067
 ] 

Tao Yang commented on YARN-6029:


Thanks [~gtCarrera9] for correcting me. There is something wrong in my words, 
sorry about that. I have already considered data races but not found, maybe 
missed somewhere.

> CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by 
> Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to 
> release a reserved container
> --
>
> Key: YARN-6029
> URL: https://issues.apache.org/jira/browse/YARN-6029
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.8.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Blocker
> Attachments: YARN-6029.001.patch, deadlock.jstack
>
>
> When ParentQueue#getQueueUserAclInfo is called (e.g. a client calls 
> YarnClient#getQueueAclsInfo) just at the moment that 
> LeafQueue#assignContainers is called and before notifying parent queue to 
> release resource (should release a reserved container), then ResourceManager 
> can deadlock. I found this problem on our testing environment for hadoop2.8.
> Reproduce the deadlock in chronological order
> * 1. Thread A (ResourceManager Event Processor) calls synchronized 
> LeafQueue#assignContainers (got LeafQueue instance lock of queue root.a)
> * 2. Thread B (IPC Server handler) calls synchronized 
> ParentQueue#getQueueUserAclInfo (got ParentQueue instance lock of queue 
> root), iterates over children queue acls and is blocked when calling 
> synchronized LeafQueue#getQueueUserAclInfo (the LeafQueue instance lock of 
> queue root.a is hold by Thread A)
> * 3. Thread A wants to inform the parent queue that a container is being 
> completed and is blocked when invoking synchronized 
> ParentQueue#internalReleaseResource method (the ParentQueue instance lock of 
> queue root is hold by Thread B)
> I think the synchronized modifier of LeafQueue#getQueueUserAclInfo can be 
> removed to solve this problem, since this method appears to not affect fields 
> of LeafQueue instance.
> Attach patch with UT for review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5756) Add state-machine implementation for scheduler queues

2016-12-27 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-5756:
-
Summary: Add state-machine implementation for scheduler queues  (was: Add 
state-machine implementation for queues)

> Add state-machine implementation for scheduler queues
> -
>
> Key: YARN-5756
> URL: https://issues.apache.org/jira/browse/YARN-5756
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-5756.1.patch, YARN-5756.2.patch, YARN-5756.3.patch, 
> YARN-5756.4.patch, YARN-5756.5.patch, YARN-5756.6.patch, YARN-5756.6.patch, 
> YARN-5756.7.patch, YARN-5756.8.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5756) Add state-machine implementation for scheduler queues

2016-12-27 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-5756:
-
Fix Version/s: 3.0.0-alpha2
   2.9.0

> Add state-machine implementation for scheduler queues
> -
>
> Key: YARN-5756
> URL: https://issues.apache.org/jira/browse/YARN-5756
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: YARN-5756.1.patch, YARN-5756.2.patch, YARN-5756.3.patch, 
> YARN-5756.4.patch, YARN-5756.5.patch, YARN-5756.6.patch, YARN-5756.6.patch, 
> YARN-5756.7.patch, YARN-5756.8.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5756) Add state-machine implementation for scheduler queues

2016-12-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782087#comment-15782087
 ] 

Wangda Tan commented on YARN-5756:
--

Committed to trunk / branch-2, thanks [~xgong] for working on this and thanks 
reviews from [~gtCarrera9]!

> Add state-machine implementation for scheduler queues
> -
>
> Key: YARN-5756
> URL: https://issues.apache.org/jira/browse/YARN-5756
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: YARN-5756.1.patch, YARN-5756.2.patch, YARN-5756.3.patch, 
> YARN-5756.4.patch, YARN-5756.5.patch, YARN-5756.6.patch, YARN-5756.6.patch, 
> YARN-5756.7.patch, YARN-5756.8.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6024) Capacity Scheduler continuous reservation looking doesn't work when queue's used+reserved = max

2016-12-27 Thread Ying Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782092#comment-15782092
 ] 

Ying Zhang commented on YARN-6024:
--

Sorry for the confusion, [~leftnoteasy] and [~sunilg] :-)
Yes, it totally makes sense to me, please go ahead.

Here is what I mean anyway:
In 2.7.3, we are comparing:
  {code}newTotalWithoutReservedResource (which is like "totalUsed + 
newly_required - resourceCouldBeUnreserved") with currentLimitResource{code}
  With {code}newTotalWithoutReservedResource <= currentLimitResource{code}, we 
are only checking the queue when  "queue's available resource >= 
newly_required".
In 2.8, we are comparing:
  {code}newTotalWithoutReservedResource (which is like "totalUsed - 
resourceCouldBeUnreserved") with currentLimitResource{code}
  With {code}newTotalWithoutReservedResource < currentLimitResource{code}, we 
are checking the queue as long as "queue's available resource > 0".
There is a slightly difference.


> Capacity Scheduler continuous reservation looking doesn't work when queue's 
> used+reserved = max
> ---
>
> Key: YARN-6024
> URL: https://issues.apache.org/jira/browse/YARN-6024
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6024-branch-2.7.001.patch, 
> YARN-6024-branch-2.7.001.patch, YARN-6024.001.patch
>
>
> Found one corner case when continuous reservation looking doesn't work:
> When queue's used=max, the queue's capacity check fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6024) Capacity Scheduler continuous reservation looking doesn't work when queue's used+reserved = max

2016-12-27 Thread Ying Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782092#comment-15782092
 ] 

Ying Zhang edited comment on YARN-6024 at 12/28/16 5:25 AM:


Sorry for the confusion, [~leftnoteasy] and [~sunilg] :-)
Yes, it totally makes sense to me, please go ahead.

Here is what I mean anyway:
In 2.7.3, we are comparing:
  {code}newTotalWithoutReservedResource (which is like "totalUsed + 
newly_required - resourceCouldBeUnreserved") <= currentLimitResource{code}
  With this, we are only checking the queue when  "queue's available resource 
>= newly_required".
In 2.8, we are comparing:
  {code}newTotalWithoutReservedResource (which is like "totalUsed - 
resourceCouldBeUnreserved") < currentLimitResource{code}
  With this, we are checking the queue as long as "queue's available resource > 
0".
There is a slightly difference.



was (Author: ying zhang):
Sorry for the confusion, [~leftnoteasy] and [~sunilg] :-)
Yes, it totally makes sense to me, please go ahead.

Here is what I mean anyway:
In 2.7.3, we are comparing:
  {code}newTotalWithoutReservedResource (which is like "totalUsed + 
newly_required - resourceCouldBeUnreserved") with currentLimitResource{code}
  With {code}newTotalWithoutReservedResource <= currentLimitResource{code}, we 
are only checking the queue when  "queue's available resource >= 
newly_required".
In 2.8, we are comparing:
  {code}newTotalWithoutReservedResource (which is like "totalUsed - 
resourceCouldBeUnreserved") with currentLimitResource{code}
  With {code}newTotalWithoutReservedResource < currentLimitResource{code}, we 
are checking the queue as long as "queue's available resource > 0".
There is a slightly difference.


> Capacity Scheduler continuous reservation looking doesn't work when queue's 
> used+reserved = max
> ---
>
> Key: YARN-6024
> URL: https://issues.apache.org/jira/browse/YARN-6024
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6024-branch-2.7.001.patch, 
> YARN-6024-branch-2.7.001.patch, YARN-6024.001.patch
>
>
> Found one corner case when continuous reservation looking doesn't work:
> When queue's used=max, the queue's capacity check fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6029) CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to release a reserved

2016-12-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782098#comment-15782098
 ] 

Wangda Tan commented on YARN-6029:
--

Thanks all for comments,

[~Tao Yang] / [~gtCarrera9].

Yes removing synchronized lock will not damage internal data structure. But it 
could cause inconsistency read data, for example, queue acl could be updated 
while it being updated. So I will not in favor of this solution.

[~Naganarasimha],

I still prefer to fix the issue in scheduling logic, there're some other 
similar logics like GetQueueInfo, etc. We need to identify all these issues and 
again it could cause inconsistency of read data when queue is being refreshed 
at the same time.

> CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by 
> Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to 
> release a reserved container
> --
>
> Key: YARN-6029
> URL: https://issues.apache.org/jira/browse/YARN-6029
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.8.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Blocker
> Attachments: YARN-6029.001.patch, deadlock.jstack
>
>
> When ParentQueue#getQueueUserAclInfo is called (e.g. a client calls 
> YarnClient#getQueueAclsInfo) just at the moment that 
> LeafQueue#assignContainers is called and before notifying parent queue to 
> release resource (should release a reserved container), then ResourceManager 
> can deadlock. I found this problem on our testing environment for hadoop2.8.
> Reproduce the deadlock in chronological order
> * 1. Thread A (ResourceManager Event Processor) calls synchronized 
> LeafQueue#assignContainers (got LeafQueue instance lock of queue root.a)
> * 2. Thread B (IPC Server handler) calls synchronized 
> ParentQueue#getQueueUserAclInfo (got ParentQueue instance lock of queue 
> root), iterates over children queue acls and is blocked when calling 
> synchronized LeafQueue#getQueueUserAclInfo (the LeafQueue instance lock of 
> queue root.a is hold by Thread A)
> * 3. Thread A wants to inform the parent queue that a container is being 
> completed and is blocked when invoking synchronized 
> ParentQueue#internalReleaseResource method (the ParentQueue instance lock of 
> queue root is hold by Thread B)
> I think the synchronized modifier of LeafQueue#getQueueUserAclInfo can be 
> removed to solve this problem, since this method appears to not affect fields 
> of LeafQueue instance.
> Attach patch with UT for review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6029) CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to release a reserved

2016-12-27 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782102#comment-15782102
 ] 

Li Lu commented on YARN-6029:
-

Thanks [~wangda]! 
bq. But it could cause inconsistency read data, for example, queue acl could be 
updated while it being updated.
Makes sense to me. Let's keep and fix the synchronized blocks then... 

> CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by 
> Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to 
> release a reserved container
> --
>
> Key: YARN-6029
> URL: https://issues.apache.org/jira/browse/YARN-6029
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.8.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Blocker
> Attachments: YARN-6029.001.patch, deadlock.jstack
>
>
> When ParentQueue#getQueueUserAclInfo is called (e.g. a client calls 
> YarnClient#getQueueAclsInfo) just at the moment that 
> LeafQueue#assignContainers is called and before notifying parent queue to 
> release resource (should release a reserved container), then ResourceManager 
> can deadlock. I found this problem on our testing environment for hadoop2.8.
> Reproduce the deadlock in chronological order
> * 1. Thread A (ResourceManager Event Processor) calls synchronized 
> LeafQueue#assignContainers (got LeafQueue instance lock of queue root.a)
> * 2. Thread B (IPC Server handler) calls synchronized 
> ParentQueue#getQueueUserAclInfo (got ParentQueue instance lock of queue 
> root), iterates over children queue acls and is blocked when calling 
> synchronized LeafQueue#getQueueUserAclInfo (the LeafQueue instance lock of 
> queue root.a is hold by Thread A)
> * 3. Thread A wants to inform the parent queue that a container is being 
> completed and is blocked when invoking synchronized 
> ParentQueue#internalReleaseResource method (the ParentQueue instance lock of 
> queue root is hold by Thread B)
> I think the synchronized modifier of LeafQueue#getQueueUserAclInfo can be 
> removed to solve this problem, since this method appears to not affect fields 
> of LeafQueue instance.
> Attach patch with UT for review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6024) Capacity Scheduler continuous reservation looking doesn't work when queue's used+reserved = max

2016-12-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782109#comment-15782109
 ] 

Wangda Tan commented on YARN-6024:
--

[~Ying Zhang],

Gotcha, thanks for elaborate. Yeah it was a historical issue, I remember 
there're some changes after branch-2.8 to get rid of newRequired, but there're 
also some changes in application / leaf queue to handle resource-limit and 
headroom properly (Like respect parent queue's max capacity, etc.). I cannot 
remember all details, but to avoid regression, let's just focus changes in the 
patch.

> Capacity Scheduler continuous reservation looking doesn't work when queue's 
> used+reserved = max
> ---
>
> Key: YARN-6024
> URL: https://issues.apache.org/jira/browse/YARN-6024
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6024-branch-2.7.001.patch, 
> YARN-6024-branch-2.7.001.patch, YARN-6024.001.patch
>
>
> Found one corner case when continuous reservation looking doesn't work:
> When queue's used=max, the queue's capacity check fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5906) Update AppSchedulingInfo to use SchedulingPlacementSet

2016-12-27 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-5906:
-
Attachment: YARN-5906.5.patch

Attached ver.5 patch, rebased to latest trunk. 

> Update AppSchedulingInfo to use SchedulingPlacementSet
> --
>
> Key: YARN-5906
> URL: https://issues.apache.org/jira/browse/YARN-5906
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-5906.1.patch, YARN-5906.2.patch, YARN-5906.3.patch, 
> YARN-5906.4.patch, YARN-5906.5.patch
>
>
> Currently AppSchedulingInfo simply stores resource request and scheduler make 
> decision according to stored resource request. For example, CS/FS use 
> slightly different approach to get pending resource request and make delay 
> scheduling decision. 
> There're several benefits of moving pending resource request data structure 
> to SchedulingPlacementSet
> 1) Delay scheduling logic should be agnostic to scheduler, for example CS 
> supports count-based delay and FS supports both of count-based and time-based 
> delay. Ideally scheduler should be able to choose which delay scheduling 
> policy to use.
> 2) In addition to 1., YARN-4902 has proposal to support pluggable delay 
> scheduling behavior in addition to locality-based (host->rack->offswitch). 
> Which requires more flexibility.
> 3) To make YARN-4902 becomes real, instead of directly adding the new 
> resource request API to client, we can make scheduler to use it internally to 
> make sure it is well defined. And AppSchedulingInfo/SchedulingPlacementSet 
> will be the perfect place to isolate which ResourceRequest implementation to 
> use.
> 4) Different scheduling requirement needs different behavior of checking 
> ResourceRequest table.
> This JIRA is the 1st patch of several refactorings. Which moves all 
> ResourceRequest data structure and logics to SchedulingPlacementSet. We need 
> follow changes to make it better structured
> - Make delay scheduling to be a plugin of SchedulingPlacementSet
> - After YARN-4902 get committed, change SchedulingPlacementSet to use 
> YARN-4902 internally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6031) Application recovery failed after disabling node label

2016-12-27 Thread Ying Zhang (JIRA)
Ying Zhang created YARN-6031:


 Summary: Application recovery failed after disabling node label
 Key: YARN-6031
 URL: https://issues.apache.org/jira/browse/YARN-6031
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.8.0
Reporter: Ying Zhang
Assignee: Ying Zhang


Here is the repro steps:
Enable node label, restart RM, configure it properly, and run some jobs;
Disable node label, restart RM, and the following exception thrown:
Caused by: 
org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: Invalid 
resource request, node label not enabled but request contains label expression
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
... 10 more
The reason is that during RM restart, application recovery failed due to the 
reason that application had node label expression specified while node label 
had been disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6031) Application recovery failed after disabling node label

2016-12-27 Thread Ying Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying Zhang updated YARN-6031:
-
Description: 
Here is the repro steps:
Enable node label, restart RM, configure it properly, and run some jobs;
Disable node label, restart RM, and the following exception thrown:
{panel:title=My 
Title|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE}
Caused by: 
org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: Invalid 
resource request, node label not enabled but request contains label expression
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
... 10 more
{panel}
The reason is that during RM restart, application recovery failed due to the 
reason that application had node label expression specified while node label 
had been disabled.

  was:
Here is the repro steps:
Enable node label, restart RM, configure it properly, and run some jobs;
Disable node label, restart RM, and the following exception thrown:
Caused by: 
org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: Invalid 
resource request, node label not enabled but request contains label expression
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
... 10 more
The reason is that during RM restart, application recovery failed due to the 
reason that application had node label expression specified while node label 
had been disabled.


> Application recovery failed after disabling node label
> --
>
> Key: YARN-6031
> URL: https://issues.apache.org/jira/browse/YARN-6031
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Ying Zhang
>Assignee: Ying Zhang
>
> Here is the repro steps:
> Enable node label, restart RM, configure it properly, and run some jobs;
> Disable node label, restart RM, and the following exception thrown:
> {panel:title=My 
> Title|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE}
> Caused by: 
> org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: 
> Invalid resource request, node label not enabled but request contains label 
> expression
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplicatio

[jira] [Updated] (YARN-6031) Application recovery failed after disabling node label

2016-12-27 Thread Ying Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying Zhang updated YARN-6031:
-
Description: 
Here is the repro steps:
Enable node label, restart RM, configure it properly, and run some jobs;
Disable node label, restart RM, and the following exception thrown:
{noformat}
Caused by: 
org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: Invalid 
resource request, node label not enabled but request contains label expression
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
... 10 more
{noformat}
The reason is that during RM restart, application recovery failed due to the 
reason that application had node label expression specified while node label 
had been disabled.

  was:
Here is the repro steps:
Enable node label, restart RM, configure it properly, and run some jobs;
Disable node label, restart RM, and the following exception thrown:
{panel:title=My 
Title|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE}
Caused by: 
org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: Invalid 
resource request, node label not enabled but request contains label expression
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
... 10 more
{panel}
The reason is that during RM restart, application recovery failed due to the 
reason that application had node label expression specified while node label 
had been disabled.


> Application recovery failed after disabling node label
> --
>
> Key: YARN-6031
> URL: https://issues.apache.org/jira/browse/YARN-6031
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Ying Zhang
>Assignee: Ying Zhang
>
> Here is the repro steps:
> Enable node label, restart RM, configure it properly, and run some jobs;
> Disable node label, restart RM, and the following exception thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: 
> Invalid resource request, node label not enabled but request contains label 
> expression
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
> at 
> org.apache.hadoop.yarn.se

[jira] [Commented] (YARN-5756) Add state-machine implementation for scheduler queues

2016-12-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782136#comment-15782136
 ] 

Hudson commented on YARN-5756:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11046 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11046/])
YARN-5756. Add state-machine implementation for scheduler queues. (Xuan 
(wangda: rev 0840b4329b2428b20b862f70d72cbdcd6d1618ed)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueState.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueStateManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueStateManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/QueueState.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerQueueManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerQueue.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerQueueManager.java


> Add state-machine implementation for scheduler queues
> -
>
> Key: YARN-5756
> URL: https://issues.apache.org/jira/browse/YARN-5756
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: YARN-5756.1.patch, YARN-5756.2.patch, YARN-5756.3.patch, 
> YARN-5756.4.patch, YARN-5756.5.patch, YARN-5756.6.patch, YARN-5756.6.patch, 
> YARN-5756.7.patch, YARN-5756.8.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6031) Application recovery failed after disabling node label

2016-12-27 Thread Ying Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying Zhang updated YARN-6031:
-
Priority: Minor  (was: Major)

> Application recovery failed after disabling node label
> --
>
> Key: YARN-6031
> URL: https://issues.apache.org/jira/browse/YARN-6031
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Ying Zhang
>Assignee: Ying Zhang
>Priority: Minor
>
> Here is the repro steps:
> Enable node label, restart RM, configure it properly, and run some jobs;
> Disable node label, restart RM, and the following exception thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: 
> Invalid resource request, node label not enabled but request contains label 
> expression
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 10 more
> {noformat}
> The reason is that during RM restart, application recovery failed due to the 
> reason that application had node label expression specified while node label 
> had been disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6031) Application recovery failed after disabling node label

2016-12-27 Thread Ying Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying Zhang updated YARN-6031:
-
Description: 
Here is the repro steps:
Enable node label, restart RM, configure CS properly, and run some jobs;
Disable node label, restart RM, and the following exception thrown:
{noformat}
Caused by: 
org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: Invalid 
resource request, node label not enabled but request contains label expression
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
... 10 more
{noformat}
The reason is that during RM restart, application recovery failed due to the 
reason that application had node label expression specified while node label 
had been disabled.

  was:
Here is the repro steps:
Enable node label, restart RM, configure it properly, and run some jobs;
Disable node label, restart RM, and the following exception thrown:
{noformat}
Caused by: 
org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: Invalid 
resource request, node label not enabled but request contains label expression
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
... 10 more
{noformat}
The reason is that during RM restart, application recovery failed due to the 
reason that application had node label expression specified while node label 
had been disabled.


> Application recovery failed after disabling node label
> --
>
> Key: YARN-6031
> URL: https://issues.apache.org/jira/browse/YARN-6031
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Ying Zhang
>Assignee: Ying Zhang
>Priority: Minor
>
> Here is the repro steps:
> Enable node label, restart RM, configure CS properly, and run some jobs;
> Disable node label, restart RM, and the following exception thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: 
> Invalid resource request, node label not enabled but request contains label 
> expression
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager

[jira] [Updated] (YARN-6031) Application recovery failed after disabling node label

2016-12-27 Thread Ying Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying Zhang updated YARN-6031:
-
Description: 
Here is the repro steps:
Enable node label, restart RM, configure CS properly, and run some jobs;
Disable node label, restart RM, and the following exception thrown:
{noformat}
Caused by: 
org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: Invalid 
resource request, node label not enabled but request contains label expression
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
... 10 more
{noformat}
During RM restart, application recovery failed due to the reason that 
application had node label expression specified while node label had been 
disabled.

  was:
Here is the repro steps:
Enable node label, restart RM, configure CS properly, and run some jobs;
Disable node label, restart RM, and the following exception thrown:
{noformat}
Caused by: 
org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: Invalid 
resource request, node label not enabled but request contains label expression
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
... 10 more
{noformat}
The reason is that during RM restart, application recovery failed due to the 
reason that application had node label expression specified while node label 
had been disabled.


> Application recovery failed after disabling node label
> --
>
> Key: YARN-6031
> URL: https://issues.apache.org/jira/browse/YARN-6031
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Ying Zhang
>Assignee: Ying Zhang
>Priority: Minor
>
> Here is the repro steps:
> Enable node label, restart RM, configure CS properly, and run some jobs;
> Disable node label, restart RM, and the following exception thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: 
> Invalid resource request, node label not enabled but request contains label 
> expression
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
>   

[jira] [Commented] (YARN-6021) When your allocated minShare of all queue`s added up exceed cluster capacity you can get some queue for 0 fairshare

2016-12-27 Thread Feng Yuan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782149#comment-15782149
 ] 

Feng Yuan commented on YARN-6021:
-

[~kasha] Thanks your detailedness reply.I think your explain solves my 
questions.

> When your allocated minShare of all queue`s added up exceed cluster capacity 
> you can get some queue for 0 fairshare
> ---
>
> Key: YARN-6021
> URL: https://issues.apache.org/jira/browse/YARN-6021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.4.0
>Reporter: Feng Yuan
>Assignee: Feng Yuan
>Priority: Critical
>
> In fair-scheduler.xml,If you config the minshare add up exceed parentQueue`s 
> fairshare,for root`s childs,fairshare is cluster capacity.
> You will found your R value look like below when compute childs fairshares:
> 1.0 
> 0.5 
> 0.25 
> 0.125 
> 0.0625 
> 0.03125 
> 0.015625 
> 0.0078125 
> 0.00390625
> I find this is due to:
> double rMax = 1.0;
> while (resourceUsedWithWeightToResourceRatio(rMax, schedulables, type)
> < totalResource) {
>   rMax *= 2.0;
>   }
> because resourceUsedWithWeightToResourceRatio will add minShare together.
> As i think is really should we bring in minShare when compute fairshare?
> My advice is we just consider weight is enough,and minshare's guarantee
> will get fulfill when assginContainer!
> Hope suggestion!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4465) SchedulerUtils#validateRequest for Label check should happen only when nodelabel enabled

2016-12-27 Thread Ying Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782147#comment-15782147
 ] 

Ying Zhang commented on YARN-4465:
--

Thanks [~leftnoteasy], [~sunilg] and [~bibinchundatt]. I've created YARN-6031 
to track this. Please do comment there, I'll try to come up with a patch soon.


> SchedulerUtils#validateRequest for Label check should happen only when 
> nodelabel enabled
> 
>
> Key: YARN-4465
> URL: https://issues.apache.org/jira/browse/YARN-4465
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4465.patch, 0002-YARN-4465.patch, 
> 0003-YARN-4465.patch, 0004-YARN-4465.patch, 0006-YARN-4465.patch, 
> 0007-YARN-4465.patch
>
>
> Disable label from rm side yarn.nodelabel.enable=false
> Capacity scheduler label configuration for queue is available as below
> default label for queue = b1 as 3 and accessible labels as 1,3
> Submit application to queue A .
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException):
>  Invalid resource request, queue=b1 doesn't have permission to access all 
> labels in resource request. labelExpression of resource request=3. Queue 
> labels=1,3
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:401)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:340)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:283)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:602)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:247)
> {noformat}
> # Ignore default label expression when label is disabled *or*
> # NormalizeResourceRequest we can set label expression to  
> when node label is not enabled *or*
> # Improve message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-4465) SchedulerUtils#validateRequest for Label check should happen only when nodelabel enabled

2016-12-27 Thread Ying Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782151#comment-15782151
 ] 

Ying Zhang edited comment on YARN-4465 at 12/28/16 5:58 AM:


Yes, you're right. Please see the repro steps in YARN-6031.


was (Author: ying zhang):
Yes, you're right. Please see the repro steps in YARN-6031.

> SchedulerUtils#validateRequest for Label check should happen only when 
> nodelabel enabled
> 
>
> Key: YARN-4465
> URL: https://issues.apache.org/jira/browse/YARN-4465
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4465.patch, 0002-YARN-4465.patch, 
> 0003-YARN-4465.patch, 0004-YARN-4465.patch, 0006-YARN-4465.patch, 
> 0007-YARN-4465.patch
>
>
> Disable label from rm side yarn.nodelabel.enable=false
> Capacity scheduler label configuration for queue is available as below
> default label for queue = b1 as 3 and accessible labels as 1,3
> Submit application to queue A .
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException):
>  Invalid resource request, queue=b1 doesn't have permission to access all 
> labels in resource request. labelExpression of resource request=3. Queue 
> labels=1,3
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:401)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:340)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:283)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:602)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:247)
> {noformat}
> # Ignore default label expression when label is disabled *or*
> # NormalizeResourceRequest we can set label expression to  
> when node label is not enabled *or*
> # Improve message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4465) SchedulerUtils#validateRequest for Label check should happen only when nodelabel enabled

2016-12-27 Thread Ying Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782151#comment-15782151
 ] 

Ying Zhang commented on YARN-4465:
--

Yes, you're right. Please see the repro steps in YARN-6031.

> SchedulerUtils#validateRequest for Label check should happen only when 
> nodelabel enabled
> 
>
> Key: YARN-4465
> URL: https://issues.apache.org/jira/browse/YARN-4465
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4465.patch, 0002-YARN-4465.patch, 
> 0003-YARN-4465.patch, 0004-YARN-4465.patch, 0006-YARN-4465.patch, 
> 0007-YARN-4465.patch
>
>
> Disable label from rm side yarn.nodelabel.enable=false
> Capacity scheduler label configuration for queue is available as below
> default label for queue = b1 as 3 and accessible labels as 1,3
> Submit application to queue A .
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException):
>  Invalid resource request, queue=b1 doesn't have permission to access all 
> labels in resource request. labelExpression of resource request=3. Queue 
> labels=1,3
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:401)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:340)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:283)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:602)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:247)
> {noformat}
> # Ignore default label expression when label is disabled *or*
> # NormalizeResourceRequest we can set label expression to  
> when node label is not enabled *or*
> # Improve message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-4465) SchedulerUtils#validateRequest for Label check should happen only when nodelabel enabled

2016-12-27 Thread Ying Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782147#comment-15782147
 ] 

Ying Zhang edited comment on YARN-4465 at 12/28/16 6:00 AM:


Thanks [~leftnoteasy], [~sunilg] and [~bibinchundatt]. I've created YARN-6031 
to track this. Please do comment there, I'll try to come up with a patch soon. 
And also please feel free to take over:-)



was (Author: ying zhang):
Thanks [~leftnoteasy], [~sunilg] and [~bibinchundatt]. I've created YARN-6031 
to track this. Please do comment there, I'll try to come up with a patch soon.


> SchedulerUtils#validateRequest for Label check should happen only when 
> nodelabel enabled
> 
>
> Key: YARN-4465
> URL: https://issues.apache.org/jira/browse/YARN-4465
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4465.patch, 0002-YARN-4465.patch, 
> 0003-YARN-4465.patch, 0004-YARN-4465.patch, 0006-YARN-4465.patch, 
> 0007-YARN-4465.patch
>
>
> Disable label from rm side yarn.nodelabel.enable=false
> Capacity scheduler label configuration for queue is available as below
> default label for queue = b1 as 3 and accessible labels as 1,3
> Submit application to queue A .
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException):
>  Invalid resource request, queue=b1 doesn't have permission to access all 
> labels in resource request. labelExpression of resource request=3. Queue 
> labels=1,3
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:401)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:340)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:283)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:602)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:247)
> {noformat}
> # Ignore default label expression when label is disabled *or*
> # NormalizeResourceRequest we can set label expression to  
> when node label is not enabled *or*
> # Improve message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6031) Application recovery failed after disabling node label

2016-12-27 Thread Ying Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying Zhang updated YARN-6031:
-
Description: 
Here is the repro steps:
Enable node label, restart RM, configure CS properly, and run some jobs;
Disable node label, restart RM, and the following exception thrown:
{noformat}
Caused by: 
org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: Invalid 
resource request, node label not enabled but request contains label expression
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
... 10 more
{noformat}
During RM restart, application recovery failed due to the reason that 
application had node label expression specified while node label has been 
disabled.

  was:
Here is the repro steps:
Enable node label, restart RM, configure CS properly, and run some jobs;
Disable node label, restart RM, and the following exception thrown:
{noformat}
Caused by: 
org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: Invalid 
resource request, node label not enabled but request contains label expression
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
... 10 more
{noformat}
During RM restart, application recovery failed due to the reason that 
application had node label expression specified while node label had been 
disabled.


> Application recovery failed after disabling node label
> --
>
> Key: YARN-6031
> URL: https://issues.apache.org/jira/browse/YARN-6031
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Ying Zhang
>Assignee: Ying Zhang
>Priority: Minor
>
> Here is the repro steps:
> Enable node label, restart RM, configure CS properly, and run some jobs;
> Disable node label, restart RM, and the following exception thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: 
> Invalid resource request, node label not enabled but request contains label 
> expression
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
> at 
> org.apache.

  1   2   >