[jira] [Commented] (YARN-9205) When using custom resource type, application will fail to run due to the CapacityScheduler throws InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION)

2019-01-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748462#comment-16748462
 ] 

Hadoop QA commented on YARN-9205:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 36s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 8 new + 122 unchanged - 0 fixed = 130 total (was 122) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 29s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m 16s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}149m 57s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9205 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12955729/YARN-9205-trunk.004.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 478110bbb579 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 
5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 6f0756f |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 

[jira] [Commented] (YARN-9161) Absolute resources of capacity scheduler doesn't support GPU and FPGA

2019-01-21 Thread Zac Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748468#comment-16748468
 ] 

Zac Zhou commented on YARN-9161:


I tested the TestCapacitySchedulerSurgicalPreemption locally, it doesn't fail.

I used maven command to run the whole test cases for the sub-module 
hadoop-yarn-applications-distributedshell without the patch. It failed with a 
timeout error just like the test report as well.

I also tested the case TestDistributedShell directly, it succeeded to run on my 
server.

So the failed tests seem not related to the patch. 

[~sunilg] , Could you help to review the patch when you are free.

Thanks a lot~

 

> Absolute resources of capacity scheduler doesn't support GPU and FPGA
> -
>
> Key: YARN-9161
> URL: https://issues.apache.org/jira/browse/YARN-9161
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Zac Zhou
>Assignee: Zac Zhou
>Priority: Major
> Attachments: YARN-9161.001.patch, YARN-9161.002.patch, 
> YARN-9161.003.patch, YARN-9161.004.patch, YARN-9161.005.patch, 
> YARN-9161.006.patch
>
>
> As the enum CapacitySchedulerConfiguration.AbsoluteResourceType only has two 
> elements: memory and vcores, which would filter out absolute resources 
> configuration of gpu and fpga in 
> AbstractCSQueue.updateConfigurableResourceRequirement. 
> This issue would cause gpu and fpga can't be allocated correctly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9210) RM nodes web page can not display node info

2019-01-21 Thread Jiandan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748444#comment-16748444
 ] 

Jiandan Yang  commented on YARN-9210:
-

Thanks [~cheersyang] for your commit,  I've upload patch for branch-2 and 
branch-2.9.

> RM nodes web page can not display node info
> ---
>
> Key: YARN-9210
> URL: https://issues.apache.org/jira/browse/YARN-9210
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.2, 3.3.0
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Blocker
> Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9210-branch-2.9.patch, YARN-9210-branch-2.patch, 
> YARN-9210.001.patch, screenshot-1.png
>
>
> visting http://rm_hostname:8088/cluster/nodes,  there are one "Active Nodes" 
> in the area of "Cluster Nodes Metrics" , but  detailed info of node does not 
> display. 
> Just as showed in 
> [screenshot-1.png|https://issues.apache.org/jira/secure/attachment/12955358/screenshot-1.png]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9210) RM nodes web page can not display node info

2019-01-21 Thread Jiandan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiandan Yang  updated YARN-9210:

Attachment: YARN-9210-branch-2.patch

> RM nodes web page can not display node info
> ---
>
> Key: YARN-9210
> URL: https://issues.apache.org/jira/browse/YARN-9210
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.2, 3.3.0
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Blocker
> Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9210-branch-2.9.patch, YARN-9210-branch-2.patch, 
> YARN-9210.001.patch, screenshot-1.png
>
>
> visting http://rm_hostname:8088/cluster/nodes,  there are one "Active Nodes" 
> in the area of "Cluster Nodes Metrics" , but  detailed info of node does not 
> display. 
> Just as showed in 
> [screenshot-1.png|https://issues.apache.org/jira/secure/attachment/12955358/screenshot-1.png]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9210) RM nodes web page can not display node info

2019-01-21 Thread Jiandan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiandan Yang  updated YARN-9210:

Attachment: YARN-9210-branch-2.9.patch

> RM nodes web page can not display node info
> ---
>
> Key: YARN-9210
> URL: https://issues.apache.org/jira/browse/YARN-9210
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.2, 3.3.0
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Blocker
> Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9210-branch-2.9.patch, YARN-9210.001.patch, 
> screenshot-1.png
>
>
> visting http://rm_hostname:8088/cluster/nodes,  there are one "Active Nodes" 
> in the area of "Cluster Nodes Metrics" , but  detailed info of node does not 
> display. 
> Just as showed in 
> [screenshot-1.png|https://issues.apache.org/jira/secure/attachment/12955358/screenshot-1.png]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9027) EntityGroupFSTimelineStore fails to init LevelDBCacheTimelineStore

2019-01-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748430#comment-16748430
 ] 

Hadoop QA commented on YARN-9027:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 27s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 51s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
24s{color} | {color:green} hadoop-yarn-server-timeline-pluginstorage in the 
patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 52m 28s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9027 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12955730/0003-YARN-9027.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ecb17480557d 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 6f0756f |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/23123/testReport/ |
| Max. process+thread count | 445 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/23123/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> EntityGroupFSTimelineStore 

[jira] [Commented] (YARN-9205) When using custom resource type, application will fail to run due to the CapacityScheduler throws InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION)

2019-01-21 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748429#comment-16748429
 ] 

Sunil Govindan commented on YARN-9205:
--

Looks good to me v5.

Waiting for jenkins. [~cheersyang] cud u pls take a look.

> When using custom resource type, application will fail to run due to the 
> CapacityScheduler throws 
> InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION) 
> ---
>
> Key: YARN-9205
> URL: https://issues.apache.org/jira/browse/YARN-9205
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Critical
> Attachments: YARN-9205-trunk.001.patch, YARN-9205-trunk.002.patch, 
> YARN-9205-trunk.003.patch, YARN-9205-trunk.004.patch, 
> YARN-9205-trunk.005.patch
>
>
> In a non-secure cluster. Reproduce it as follows:
>  # Set capacity scheduler in yarn-site.xml
>  # Use default capacity-scheduler.xml
>  # Set custom resource type "cmp.com/hdw" in resource-types.xml
>  # Set a value say 10 in node-resources.xml
>  # Start cluster
>  # Submit a distribute shell application which requests some "cmp.com/hdw"
> The AM will get an exception from CapacityScheduler and then failed. This bug 
> doesn't exist in FairScheduler.
> {code:java}
> 2019-01-17 22:12:11,286 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[ 2>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution Type: 
> GUARANTEED, Enforce Execution Type: false}]Resource Profile[]
> 2019-01-17 22:12:12,326 ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request! Cannot allocate containers as requested resource is greater 
> than maximum allowed allocation. Requested resource type=[cmp.com/hdw], 
> Requested resource=, maximum allowed 
> allocation=, please note that maximum allowed 
> allocation is calculated by scheduler based on maximum resource of registered 
> NodeManagers, which might be less than configured maximum 
> allocation=
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:492)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:388)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:315)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:293)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:301)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:240)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> ...{code}
> Did a roughly debugging, below method return the wrong maximum capacity.
> DefaultAMSProcessor.java, Line 234.
> {code:java}
> Resource maximumCapacity =
>  getScheduler().getMaximumResourceCapability(app.getQueue());{code}
> The above code seems should return "" 
> but returns "".
> This incorrect value might be caused by queue maximum allocation calculation 
> involved in YARN-8720:
> AbstractCSQueue.java Line364
> {code:java}
> this.maximumAllocation =
>  configuration.getMaximumAllocationPerQueue(
>  getQueuePath());{code}
> And this invokes CapacitySchedulerConfiguration.java Line 895:
> {code:java}
> Resource clusterMax = ResourceUtils.fetchMaximumAllocationFromConfig(this);
> {code}
> Passing a "this" which is not a YarnConfiguration instance will cause below 
> code return null for resource names and then only contains mandatory 
> resources. This might be the root cause.
> {code:java}
> private static Map 
> getResourceInformationMapFromConfig(
> ...
> // NULL value here!
> String[] resourceNames = conf.getStrings(YarnConfiguration.RESOURCE_TYPES);
> {code}



--
This message was sent by Atlassian JIRA

[jira] [Updated] (YARN-9205) When using custom resource type, application will fail to run due to the CapacityScheduler throws InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION)

2019-01-21 Thread Zhankun Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhankun Tang updated YARN-9205:
---
Attachment: YARN-9205-trunk.005.patch

> When using custom resource type, application will fail to run due to the 
> CapacityScheduler throws 
> InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION) 
> ---
>
> Key: YARN-9205
> URL: https://issues.apache.org/jira/browse/YARN-9205
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Critical
> Attachments: YARN-9205-trunk.001.patch, YARN-9205-trunk.002.patch, 
> YARN-9205-trunk.003.patch, YARN-9205-trunk.004.patch, 
> YARN-9205-trunk.005.patch
>
>
> In a non-secure cluster. Reproduce it as follows:
>  # Set capacity scheduler in yarn-site.xml
>  # Use default capacity-scheduler.xml
>  # Set custom resource type "cmp.com/hdw" in resource-types.xml
>  # Set a value say 10 in node-resources.xml
>  # Start cluster
>  # Submit a distribute shell application which requests some "cmp.com/hdw"
> The AM will get an exception from CapacityScheduler and then failed. This bug 
> doesn't exist in FairScheduler.
> {code:java}
> 2019-01-17 22:12:11,286 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[ 2>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution Type: 
> GUARANTEED, Enforce Execution Type: false}]Resource Profile[]
> 2019-01-17 22:12:12,326 ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request! Cannot allocate containers as requested resource is greater 
> than maximum allowed allocation. Requested resource type=[cmp.com/hdw], 
> Requested resource=, maximum allowed 
> allocation=, please note that maximum allowed 
> allocation is calculated by scheduler based on maximum resource of registered 
> NodeManagers, which might be less than configured maximum 
> allocation=
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:492)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:388)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:315)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:293)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:301)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:240)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> ...{code}
> Did a roughly debugging, below method return the wrong maximum capacity.
> DefaultAMSProcessor.java, Line 234.
> {code:java}
> Resource maximumCapacity =
>  getScheduler().getMaximumResourceCapability(app.getQueue());{code}
> The above code seems should return "" 
> but returns "".
> This incorrect value might be caused by queue maximum allocation calculation 
> involved in YARN-8720:
> AbstractCSQueue.java Line364
> {code:java}
> this.maximumAllocation =
>  configuration.getMaximumAllocationPerQueue(
>  getQueuePath());{code}
> And this invokes CapacitySchedulerConfiguration.java Line 895:
> {code:java}
> Resource clusterMax = ResourceUtils.fetchMaximumAllocationFromConfig(this);
> {code}
> Passing a "this" which is not a YarnConfiguration instance will cause below 
> code return null for resource names and then only contains mandatory 
> resources. This might be the root cause.
> {code:java}
> private static Map 
> getResourceInformationMapFromConfig(
> ...
> // NULL value here!
> String[] resourceNames = conf.getStrings(YarnConfiguration.RESOURCE_TYPES);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To 

[jira] [Commented] (YARN-9205) When using custom resource type, application will fail to run due to the CapacityScheduler throws InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION)

2019-01-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748414#comment-16748414
 ] 

Hadoop QA commented on YARN-9205:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 20s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 35s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 7 new + 122 unchanged - 0 fixed = 129 total (was 122) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 31s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 91m 54s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}150m 15s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9205 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12955726/YARN-9205-trunk.003.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux fa1447c3d615 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 
5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d43df31 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 

[jira] [Updated] (YARN-9027) EntityGroupFSTimelineStore fails to init LevelDBCacheTimelineStore

2019-01-21 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9027:

Attachment: 0003-YARN-9027.patch

> EntityGroupFSTimelineStore fails to init LevelDBCacheTimelineStore 
> ---
>
> Key: YARN-9027
> URL: https://issues.apache.org/jira/browse/YARN-9027
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: 0001-YARN-9027.patch, 0002-YARN-9027.patch, 
> 0003-YARN-9027.patch
>
>
> EntityGroupFSTimelineStore fails to init LevelDBCacheTimelineStore as the 
> expected default constructor is not present.
> {code}
> Caused by: java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.timeline.LevelDBCacheTimelineStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityCacheItem.refreshCache(EntityCacheItem.java:100)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getCachedStore(EntityGroupFSTimelineStore.java:1026)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getTimelineStoresFromCacheIds(EntityGroupFSTimelineStore.java:945)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getTimelineStoresForRead(EntityGroupFSTimelineStore.java:998)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getEntities(EntityGroupFSTimelineStore.java:1040)
> at 
> org.apache.hadoop.yarn.server.timeline.TimelineDataManager.doGetEntities(TimelineDataManager.java:168)
> at 
> org.apache.hadoop.yarn.server.timeline.TimelineDataManager.getEntities(TimelineDataManager.java:138)
> at 
> org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntities(TimelineWebServices.java:117)
> ... 59 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.timeline.LevelDBCacheTimelineStore.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.getDeclaredConstructor(Class.java:2178)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128)
> ... 67 more
> {code}
> Repro:
> {code}
> 1. Set Offline Caching with
> yarn.timeline-service.entity-group-fs-store.cache-store-class=org.apache.hadoop.yarn.server.timeline.LevelDBCacheTimelineStore
> 2. Run a Tez query
> 3. Check Tez View
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9142) UI cluster nodes page is broken

2019-01-21 Thread Akhil PB (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akhil PB updated YARN-9142:
---
Fix Version/s: 3.1.3
   3.2.1
   3.3.0
   3.1.2
   3.0.4

> UI cluster nodes page is broken
> ---
>
> Key: YARN-9142
> URL: https://issues.apache.org/jira/browse/YARN-9142
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.2, 3.3.0
>Reporter: Rohith Sharma K S
>Assignee: Akhil PB
>Priority: Critical
> Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: ClusterNodePage.png, 
> cluster-nodes-page-hadoop-3.3.0-SNAPSHOT.png
>
>
> It is observed in trunk build YARN cluster node pages is broken even though 
> data exist. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-9142) UI cluster nodes page is broken

2019-01-21 Thread Akhil PB (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akhil PB reopened YARN-9142:


> UI cluster nodes page is broken
> ---
>
> Key: YARN-9142
> URL: https://issues.apache.org/jira/browse/YARN-9142
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.2, 3.3.0
>Reporter: Rohith Sharma K S
>Assignee: Akhil PB
>Priority: Critical
> Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: ClusterNodePage.png, 
> cluster-nodes-page-hadoop-3.3.0-SNAPSHOT.png
>
>
> It is observed in trunk build YARN cluster node pages is broken even though 
> data exist. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-9142) UI cluster nodes page is broken

2019-01-21 Thread Akhil PB (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akhil PB resolved YARN-9142.

Resolution: Duplicate

> UI cluster nodes page is broken
> ---
>
> Key: YARN-9142
> URL: https://issues.apache.org/jira/browse/YARN-9142
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.2, 3.3.0
>Reporter: Rohith Sharma K S
>Assignee: Akhil PB
>Priority: Critical
> Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: ClusterNodePage.png, 
> cluster-nodes-page-hadoop-3.3.0-SNAPSHOT.png
>
>
> It is observed in trunk build YARN cluster node pages is broken even though 
> data exist. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-9142) UI cluster nodes page is broken

2019-01-21 Thread Akhil PB (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akhil PB resolved YARN-9142.

Resolution: Fixed

Resolving the ticket since YARN-9210 is fixed.

> UI cluster nodes page is broken
> ---
>
> Key: YARN-9142
> URL: https://issues.apache.org/jira/browse/YARN-9142
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Akhil PB
>Priority: Critical
> Attachments: ClusterNodePage.png, 
> cluster-nodes-page-hadoop-3.3.0-SNAPSHOT.png
>
>
> It is observed in trunk build YARN cluster node pages is broken even though 
> data exist. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9142) UI cluster nodes page is broken

2019-01-21 Thread Akhil PB (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akhil PB updated YARN-9142:
---
Affects Version/s: 3.3.0
   3.1.2

> UI cluster nodes page is broken
> ---
>
> Key: YARN-9142
> URL: https://issues.apache.org/jira/browse/YARN-9142
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.2, 3.3.0
>Reporter: Rohith Sharma K S
>Assignee: Akhil PB
>Priority: Critical
> Attachments: ClusterNodePage.png, 
> cluster-nodes-page-hadoop-3.3.0-SNAPSHOT.png
>
>
> It is observed in trunk build YARN cluster node pages is broken even though 
> data exist. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9205) When using custom resource type, application will fail to run due to the CapacityScheduler throws InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION)

2019-01-21 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748397#comment-16748397
 ] 

Zhankun Tang commented on YARN-9205:


[~leftnoteasy] , Yeah. Please review. Added two test cases for it.
 # Make sure that CS configuration is loaded
 # Submit an application request custom resource and will succeed.

I also run the two test cases without the fix, both cases will fail with the 
expected error message.

> When using custom resource type, application will fail to run due to the 
> CapacityScheduler throws 
> InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION) 
> ---
>
> Key: YARN-9205
> URL: https://issues.apache.org/jira/browse/YARN-9205
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Critical
> Attachments: YARN-9205-trunk.001.patch, YARN-9205-trunk.002.patch, 
> YARN-9205-trunk.003.patch, YARN-9205-trunk.004.patch
>
>
> In a non-secure cluster. Reproduce it as follows:
>  # Set capacity scheduler in yarn-site.xml
>  # Use default capacity-scheduler.xml
>  # Set custom resource type "cmp.com/hdw" in resource-types.xml
>  # Set a value say 10 in node-resources.xml
>  # Start cluster
>  # Submit a distribute shell application which requests some "cmp.com/hdw"
> The AM will get an exception from CapacityScheduler and then failed. This bug 
> doesn't exist in FairScheduler.
> {code:java}
> 2019-01-17 22:12:11,286 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[ 2>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution Type: 
> GUARANTEED, Enforce Execution Type: false}]Resource Profile[]
> 2019-01-17 22:12:12,326 ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request! Cannot allocate containers as requested resource is greater 
> than maximum allowed allocation. Requested resource type=[cmp.com/hdw], 
> Requested resource=, maximum allowed 
> allocation=, please note that maximum allowed 
> allocation is calculated by scheduler based on maximum resource of registered 
> NodeManagers, which might be less than configured maximum 
> allocation=
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:492)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:388)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:315)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:293)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:301)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:240)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> ...{code}
> Did a roughly debugging, below method return the wrong maximum capacity.
> DefaultAMSProcessor.java, Line 234.
> {code:java}
> Resource maximumCapacity =
>  getScheduler().getMaximumResourceCapability(app.getQueue());{code}
> The above code seems should return "" 
> but returns "".
> This incorrect value might be caused by queue maximum allocation calculation 
> involved in YARN-8720:
> AbstractCSQueue.java Line364
> {code:java}
> this.maximumAllocation =
>  configuration.getMaximumAllocationPerQueue(
>  getQueuePath());{code}
> And this invokes CapacitySchedulerConfiguration.java Line 895:
> {code:java}
> Resource clusterMax = ResourceUtils.fetchMaximumAllocationFromConfig(this);
> {code}
> Passing a "this" which is not a YarnConfiguration instance will cause below 
> code return null for resource names and then only contains mandatory 
> resources. This might be the root cause.
> {code:java}
> private static Map 
> getResourceInformationMapFromConfig(

[jira] [Updated] (YARN-9205) When using custom resource type, application will fail to run due to the CapacityScheduler throws InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION)

2019-01-21 Thread Zhankun Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhankun Tang updated YARN-9205:
---
Attachment: YARN-9205-trunk.004.patch

> When using custom resource type, application will fail to run due to the 
> CapacityScheduler throws 
> InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION) 
> ---
>
> Key: YARN-9205
> URL: https://issues.apache.org/jira/browse/YARN-9205
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Critical
> Attachments: YARN-9205-trunk.001.patch, YARN-9205-trunk.002.patch, 
> YARN-9205-trunk.003.patch, YARN-9205-trunk.004.patch
>
>
> In a non-secure cluster. Reproduce it as follows:
>  # Set capacity scheduler in yarn-site.xml
>  # Use default capacity-scheduler.xml
>  # Set custom resource type "cmp.com/hdw" in resource-types.xml
>  # Set a value say 10 in node-resources.xml
>  # Start cluster
>  # Submit a distribute shell application which requests some "cmp.com/hdw"
> The AM will get an exception from CapacityScheduler and then failed. This bug 
> doesn't exist in FairScheduler.
> {code:java}
> 2019-01-17 22:12:11,286 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[ 2>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution Type: 
> GUARANTEED, Enforce Execution Type: false}]Resource Profile[]
> 2019-01-17 22:12:12,326 ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request! Cannot allocate containers as requested resource is greater 
> than maximum allowed allocation. Requested resource type=[cmp.com/hdw], 
> Requested resource=, maximum allowed 
> allocation=, please note that maximum allowed 
> allocation is calculated by scheduler based on maximum resource of registered 
> NodeManagers, which might be less than configured maximum 
> allocation=
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:492)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:388)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:315)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:293)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:301)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:240)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> ...{code}
> Did a roughly debugging, below method return the wrong maximum capacity.
> DefaultAMSProcessor.java, Line 234.
> {code:java}
> Resource maximumCapacity =
>  getScheduler().getMaximumResourceCapability(app.getQueue());{code}
> The above code seems should return "" 
> but returns "".
> This incorrect value might be caused by queue maximum allocation calculation 
> involved in YARN-8720:
> AbstractCSQueue.java Line364
> {code:java}
> this.maximumAllocation =
>  configuration.getMaximumAllocationPerQueue(
>  getQueuePath());{code}
> And this invokes CapacitySchedulerConfiguration.java Line 895:
> {code:java}
> Resource clusterMax = ResourceUtils.fetchMaximumAllocationFromConfig(this);
> {code}
> Passing a "this" which is not a YarnConfiguration instance will cause below 
> code return null for resource names and then only contains mandatory 
> resources. This might be the root cause.
> {code:java}
> private static Map 
> getResourceInformationMapFromConfig(
> ...
> // NULL value here!
> String[] resourceNames = conf.getStrings(YarnConfiguration.RESOURCE_TYPES);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (YARN-9210) RM nodes web page can not display node info

2019-01-21 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748359#comment-16748359
 ] 

Weiwei Yang commented on YARN-9210:
---

[~yangjiandan], can u help to provide a patch for branch-2? It looks like 
YARN-9036 was committed to branch-2.9 and branch-2 as well.

> RM nodes web page can not display node info
> ---
>
> Key: YARN-9210
> URL: https://issues.apache.org/jira/browse/YARN-9210
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.2, 3.3.0
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Blocker
> Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9210.001.patch, screenshot-1.png
>
>
> visting http://rm_hostname:8088/cluster/nodes,  there are one "Active Nodes" 
> in the area of "Cluster Nodes Metrics" , but  detailed info of node does not 
> display. 
> Just as showed in 
> [screenshot-1.png|https://issues.apache.org/jira/secure/attachment/12955358/screenshot-1.png]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9210) RM nodes web page can not display node info

2019-01-21 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748358#comment-16748358
 ] 

Weiwei Yang commented on YARN-9210:
---

Committed to trunk, cherry picked to branch-3.2, branch-3.1, branch-3.0. And 
also branch-3.1.2 as it is still open for critical fixes.

> RM nodes web page can not display node info
> ---
>
> Key: YARN-9210
> URL: https://issues.apache.org/jira/browse/YARN-9210
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.2, 3.3.0
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Blocker
> Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9210.001.patch, screenshot-1.png
>
>
> visting http://rm_hostname:8088/cluster/nodes,  there are one "Active Nodes" 
> in the area of "Cluster Nodes Metrics" , but  detailed info of node does not 
> display. 
> Just as showed in 
> [screenshot-1.png|https://issues.apache.org/jira/secure/attachment/12955358/screenshot-1.png]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9205) When using custom resource type, application will fail to run due to the CapacityScheduler throws InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION)

2019-01-21 Thread Zhankun Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhankun Tang updated YARN-9205:
---
Attachment: YARN-9205-trunk.003.patch

> When using custom resource type, application will fail to run due to the 
> CapacityScheduler throws 
> InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION) 
> ---
>
> Key: YARN-9205
> URL: https://issues.apache.org/jira/browse/YARN-9205
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Critical
> Attachments: YARN-9205-trunk.001.patch, YARN-9205-trunk.002.patch, 
> YARN-9205-trunk.003.patch
>
>
> In a non-secure cluster. Reproduce it as follows:
>  # Set capacity scheduler in yarn-site.xml
>  # Use default capacity-scheduler.xml
>  # Set custom resource type "cmp.com/hdw" in resource-types.xml
>  # Set a value say 10 in node-resources.xml
>  # Start cluster
>  # Submit a distribute shell application which requests some "cmp.com/hdw"
> The AM will get an exception from CapacityScheduler and then failed. This bug 
> doesn't exist in FairScheduler.
> {code:java}
> 2019-01-17 22:12:11,286 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[ 2>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution Type: 
> GUARANTEED, Enforce Execution Type: false}]Resource Profile[]
> 2019-01-17 22:12:12,326 ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request! Cannot allocate containers as requested resource is greater 
> than maximum allowed allocation. Requested resource type=[cmp.com/hdw], 
> Requested resource=, maximum allowed 
> allocation=, please note that maximum allowed 
> allocation is calculated by scheduler based on maximum resource of registered 
> NodeManagers, which might be less than configured maximum 
> allocation=
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:492)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:388)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:315)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:293)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:301)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:240)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> ...{code}
> Did a roughly debugging, below method return the wrong maximum capacity.
> DefaultAMSProcessor.java, Line 234.
> {code:java}
> Resource maximumCapacity =
>  getScheduler().getMaximumResourceCapability(app.getQueue());{code}
> The above code seems should return "" 
> but returns "".
> This incorrect value might be caused by queue maximum allocation calculation 
> involved in YARN-8720:
> AbstractCSQueue.java Line364
> {code:java}
> this.maximumAllocation =
>  configuration.getMaximumAllocationPerQueue(
>  getQueuePath());{code}
> And this invokes CapacitySchedulerConfiguration.java Line 895:
> {code:java}
> Resource clusterMax = ResourceUtils.fetchMaximumAllocationFromConfig(this);
> {code}
> Passing a "this" which is not a YarnConfiguration instance will cause below 
> code return null for resource names and then only contains mandatory 
> resources. This might be the root cause.
> {code:java}
> private static Map 
> getResourceInformationMapFromConfig(
> ...
> // NULL value here!
> String[] resourceNames = conf.getStrings(YarnConfiguration.RESOURCE_TYPES);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: 

[jira] [Comment Edited] (YARN-9210) RM nodes web page can not display node info

2019-01-21 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748358#comment-16748358
 ] 

Weiwei Yang edited comment on YARN-9210 at 1/22/19 3:38 AM:


Committed to trunk, cherry picked to branch-3.2, branch-3.1, branch-3.0. And 
also branch-3.1.2 as it is still open for critical fixes. Thanks [~yangjiandan] 
for the contribution.


was (Author: cheersyang):
Committed to trunk, cherry picked to branch-3.2, branch-3.1, branch-3.0. And 
also branch-3.1.2 as it is still open for critical fixes.

> RM nodes web page can not display node info
> ---
>
> Key: YARN-9210
> URL: https://issues.apache.org/jira/browse/YARN-9210
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.2, 3.3.0
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Blocker
> Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9210.001.patch, screenshot-1.png
>
>
> visting http://rm_hostname:8088/cluster/nodes,  there are one "Active Nodes" 
> in the area of "Cluster Nodes Metrics" , but  detailed info of node does not 
> display. 
> Just as showed in 
> [screenshot-1.png|https://issues.apache.org/jira/secure/attachment/12955358/screenshot-1.png]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9210) RM nodes web page can not display node info

2019-01-21 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-9210:
--
Fix Version/s: 3.1.2

> RM nodes web page can not display node info
> ---
>
> Key: YARN-9210
> URL: https://issues.apache.org/jira/browse/YARN-9210
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.2, 3.3.0
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Blocker
> Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9210.001.patch, screenshot-1.png
>
>
> visting http://rm_hostname:8088/cluster/nodes,  there are one "Active Nodes" 
> in the area of "Cluster Nodes Metrics" , but  detailed info of node does not 
> display. 
> Just as showed in 
> [screenshot-1.png|https://issues.apache.org/jira/secure/attachment/12955358/screenshot-1.png]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9210) RM nodes web page can not display node info

2019-01-21 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-9210:
--
Fix Version/s: 3.0.4

> RM nodes web page can not display node info
> ---
>
> Key: YARN-9210
> URL: https://issues.apache.org/jira/browse/YARN-9210
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.2, 3.3.0
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Blocker
> Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9210.001.patch, screenshot-1.png
>
>
> visting http://rm_hostname:8088/cluster/nodes,  there are one "Active Nodes" 
> in the area of "Cluster Nodes Metrics" , but  detailed info of node does not 
> display. 
> Just as showed in 
> [screenshot-1.png|https://issues.apache.org/jira/secure/attachment/12955358/screenshot-1.png]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9210) RM nodes web page can not display node info

2019-01-21 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-9210:
--
Fix Version/s: 3.1.3
   3.2.1

> RM nodes web page can not display node info
> ---
>
> Key: YARN-9210
> URL: https://issues.apache.org/jira/browse/YARN-9210
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.2, 3.3.0
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Blocker
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9210.001.patch, screenshot-1.png
>
>
> visting http://rm_hostname:8088/cluster/nodes,  there are one "Active Nodes" 
> in the area of "Cluster Nodes Metrics" , but  detailed info of node does not 
> display. 
> Just as showed in 
> [screenshot-1.png|https://issues.apache.org/jira/secure/attachment/12955358/screenshot-1.png]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9210) RM nodes web page can not display node info

2019-01-21 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748343#comment-16748343
 ] 

Hudson commented on YARN-9210:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15798 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15798/])
YARN-9210. RM nodes web page can not display node info. Contributed by (wwei: 
rev d43df31751bcadab77d42b31e3e1dd5748b471b5)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java


> RM nodes web page can not display node info
> ---
>
> Key: YARN-9210
> URL: https://issues.apache.org/jira/browse/YARN-9210
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.2, 3.3.0
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Blocker
> Fix For: 3.3.0
>
> Attachments: YARN-9210.001.patch, screenshot-1.png
>
>
> visting http://rm_hostname:8088/cluster/nodes,  there are one "Active Nodes" 
> in the area of "Cluster Nodes Metrics" , but  detailed info of node does not 
> display. 
> Just as showed in 
> [screenshot-1.png|https://issues.apache.org/jira/secure/attachment/12955358/screenshot-1.png]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9210) RM nodes web page can not display node info

2019-01-21 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748332#comment-16748332
 ] 

Weiwei Yang commented on YARN-9210:
---

Thanks for the confirmation [~oliverhuh...@gmail.com], committing the patch now.

I'll open another ticket to see how to add some proper UT for this.

> RM nodes web page can not display node info
> ---
>
> Key: YARN-9210
> URL: https://issues.apache.org/jira/browse/YARN-9210
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.2, 3.3.0
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Blocker
> Attachments: YARN-9210.001.patch, screenshot-1.png
>
>
> visting http://rm_hostname:8088/cluster/nodes,  there are one "Active Nodes" 
> in the area of "Cluster Nodes Metrics" , but  detailed info of node does not 
> display. 
> Just as showed in 
> [screenshot-1.png|https://issues.apache.org/jira/secure/attachment/12955358/screenshot-1.png]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9210) RM nodes web page can not display node info

2019-01-21 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-9210:
--
Summary: RM nodes web page can not display node info  (was: YARN UI can not 
display node info)

> RM nodes web page can not display node info
> ---
>
> Key: YARN-9210
> URL: https://issues.apache.org/jira/browse/YARN-9210
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.2, 3.3.0
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Blocker
> Attachments: YARN-9210.001.patch, screenshot-1.png
>
>
> visting http://rm_hostname:8088/cluster/nodes,  there are one "Active Nodes" 
> in the area of "Cluster Nodes Metrics" , but  detailed info of node does not 
> display. 
> Just as showed in 
> [screenshot-1.png|https://issues.apache.org/jira/secure/attachment/12955358/screenshot-1.png]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6616) YARN AHS shows submitTime for jobs same as startTime

2019-01-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748285#comment-16748285
 ] 

Hadoop QA commented on YARN-6616:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 9 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  5m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
21m 59s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  7m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
55s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 14m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
40s{color} | {color:green} root: The patch generated 0 new + 676 unchanged - 3 
fixed = 676 total (was 679) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  5m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  8m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m  
4s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
57s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
31s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
44s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
29s{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m 54s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 25m 
23s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}125m 49s{color} 
| {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
58s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| 

[jira] [Commented] (YARN-9036) Escape newlines in health report in YARN UI

2019-01-21 Thread Keqiu Hu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748234#comment-16748234
 ] 

Keqiu Hu commented on YARN-9036:


Sorry for the late reply, replied in that 9210, that works for us. 
[~cheersyang] 

> Escape newlines in health report in YARN UI
> ---
>
> Key: YARN-9036
> URL: https://issues.apache.org/jira/browse/YARN-9036
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Keqiu Hu
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.1.2, 3.3.0, 3.2.1, 2.9.3
>
> Attachments: YARN-9036.001.patch, YARN-9036.002.patch, 
> YARN-9036.003.patch, YARN-9036.003.patch, YARN-9036.004.patch, 
> image-2018-12-14-11-33-54-361.png
>
>
> NodesPage prints health report info in the UI in a javascript string. If 
> health report contains newlines it will garble the generated code and the 
> list of nodes cannot be rendered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9210) YARN UI can not display node info

2019-01-21 Thread Keqiu Hu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748233#comment-16748233
 ] 

Keqiu Hu commented on YARN-9210:


LGTM, thanks for fixing this. Works for me. Shall we add a unit test for this 
as we encountered multiple bugs with this?

> YARN UI can not display node info
> -
>
> Key: YARN-9210
> URL: https://issues.apache.org/jira/browse/YARN-9210
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.2, 3.3.0
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Blocker
> Attachments: YARN-9210.001.patch, screenshot-1.png
>
>
> visting http://rm_hostname:8088/cluster/nodes,  there are one "Active Nodes" 
> in the area of "Cluster Nodes Metrics" , but  detailed info of node does not 
> display. 
> Just as showed in 
> [screenshot-1.png|https://issues.apache.org/jira/secure/attachment/12955358/screenshot-1.png]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6616) YARN AHS shows submitTime for jobs same as startTime

2019-01-21 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-6616:

Attachment: 0005-YARN-6616.patch

> YARN AHS shows submitTime for jobs same as startTime
> 
>
> Key: YARN-6616
> URL: https://issues.apache.org/jira/browse/YARN-6616
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: 0001-YARN-6616.patch, 0002-YARN-6616.patch, 
> 0003-YARN-6616.patch, 0004-YARN-6616.patch, 0005-YARN-6616.patch
>
>
> YARN AHS returns startTime value for both submitTime and startTime for the 
> jobs.  Looks the code sets the submitTime with startTime value. 
> https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java#L80
> {code}
> curl --negotiate -u: 
> http://prabhuzeppelin3.openstacklocal:8188/ws/v1/applicationhistory/apps
> 149501553757414950155375741495016384084
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9204) RM fails to start if absolute resource is specified for partition capacity in CS queues

2019-01-21 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748110#comment-16748110
 ] 

Wangda Tan commented on YARN-9204:
--

Cherry-picked to branch-3.1.2 as well, thanks [~yangjiandan]/ [~cheersyang].

>  RM fails to start if absolute resource is specified for partition capacity 
> in CS queues
> 
>
> Key: YARN-9204
> URL: https://issues.apache.org/jira/browse/YARN-9204
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.3
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Blocker
> Fix For: 3.1.2, 3.3.0, 3.2.1
>
> Attachments: YARN-9204.001.patch, YARN-9204.002.patch, 
> YARN-9204.003.patch, YARN-9204.004.patch, YARN-9204.005.patch, 
> YARN-9204.006.patch
>
>
> When I set *yarn.scheduler.capacity..capacity* and 
> *yarn.scheduler.capacity..accessible-node-labels..capacity*
>   to absolute resource value, staring RM fails, and throw following 
> exception, and after diving into relate code, I found the logic of checking  
> absolute resource value maybe wrong.
> {code:java}
> 2019-01-17 20:25:45,716 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> java.lang.NumberFormatException: For input string: "[memory=40960,vcore=48]"
> at 
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
> at sun.misc.FloatingDecimal.parseFloat(FloatingDecimal.java:122)
> at java.lang.Float.parseFloat(Float.java:451)
> at 
> org.apache.hadoop.conf.Configuration.getFloat(Configuration.java:1606)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.internalGetLabeledQueue
> Capacity(CapacitySchedulerConfiguration.java:655)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getLabeledQueueCapacity
> (CapacitySchedulerConfiguration.java:670)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadCapacitiesByLabelsFromConf(CSQueueUti
> ls.java:135)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadUpdateAndCheckCapacities(CSQueueUtils
> .java:110)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupConfigurableCapacities(AbstractCS
> Queue.java:179)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java
> :356)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java
> :323)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setupQueueConfigs(ParentQueue.java:130)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.(ParentQueue.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySched
> ulerQueueManager.java:275)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.initializeQueues(Capacit
> ySchedulerQueueManager.java:158)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.j
> ava:715)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java
> :360)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:4
> 25)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:817)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1218)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:317)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1500)
> 2019-01-17 20:25:45,719 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG:
> {code}



--
This message was sent by 

[jira] [Updated] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration

2019-01-21 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8747:
-
Fix Version/s: (was: 3.1.3)
   3.1.2

> [UI2] YARN UI2 page loading failed due to js error under some time zone 
> configuration
> -
>
> Key: YARN-8747
> URL: https://issues.apache.org/jira/browse/YARN-8747
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 3.1.1
>Reporter: collinma
>Assignee: collinma
>Priority: Critical
> Fix For: 2.10.0, 3.0.4, 3.1.2, 3.3.0, 3.2.1, 2.9.3
>
> Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured 
> as GMT+8,  the web browser time zone is GMT+8 too. yarn ui page loaded failed 
> due to js error:
>  
> !image-2018-09-05-18-54-03-991.png!
> The moment-timezone js component raised that error. This has been fixed in 
> moment-timezone 
> v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need 
> to update moment-timezone version accordingly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9204) RM fails to start if absolute resource is specified for partition capacity in CS queues

2019-01-21 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-9204:
-
Fix Version/s: (was: 3.1.3)
   3.1.2

>  RM fails to start if absolute resource is specified for partition capacity 
> in CS queues
> 
>
> Key: YARN-9204
> URL: https://issues.apache.org/jira/browse/YARN-9204
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.3
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Blocker
> Fix For: 3.1.2, 3.3.0, 3.2.1
>
> Attachments: YARN-9204.001.patch, YARN-9204.002.patch, 
> YARN-9204.003.patch, YARN-9204.004.patch, YARN-9204.005.patch, 
> YARN-9204.006.patch
>
>
> When I set *yarn.scheduler.capacity..capacity* and 
> *yarn.scheduler.capacity..accessible-node-labels..capacity*
>   to absolute resource value, staring RM fails, and throw following 
> exception, and after diving into relate code, I found the logic of checking  
> absolute resource value maybe wrong.
> {code:java}
> 2019-01-17 20:25:45,716 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> java.lang.NumberFormatException: For input string: "[memory=40960,vcore=48]"
> at 
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
> at sun.misc.FloatingDecimal.parseFloat(FloatingDecimal.java:122)
> at java.lang.Float.parseFloat(Float.java:451)
> at 
> org.apache.hadoop.conf.Configuration.getFloat(Configuration.java:1606)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.internalGetLabeledQueue
> Capacity(CapacitySchedulerConfiguration.java:655)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getLabeledQueueCapacity
> (CapacitySchedulerConfiguration.java:670)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadCapacitiesByLabelsFromConf(CSQueueUti
> ls.java:135)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadUpdateAndCheckCapacities(CSQueueUtils
> .java:110)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupConfigurableCapacities(AbstractCS
> Queue.java:179)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java
> :356)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java
> :323)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setupQueueConfigs(ParentQueue.java:130)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.(ParentQueue.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySched
> ulerQueueManager.java:275)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.initializeQueues(Capacit
> ySchedulerQueueManager.java:158)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.j
> ava:715)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java
> :360)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:4
> 25)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:817)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1218)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:317)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1500)
> 2019-01-17 20:25:45,719 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG:
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (YARN-9194) Invalid event: REGISTERED and LAUNCH_FAILED at FAILED, and NullPointerException happens in RM while shutdown a NM

2019-01-21 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748109#comment-16748109
 ] 

Wangda Tan commented on YARN-9194:
--

Cherry-picked to branch-3.1.2 as well.

> Invalid event: REGISTERED and LAUNCH_FAILED at FAILED, and 
> NullPointerException happens in RM while shutdown a NM
> -
>
> Key: YARN-9194
> URL: https://issues.apache.org/jira/browse/YARN-9194
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: lujie
>Assignee: lujie
>Priority: Critical
> Fix For: 3.1.2, 3.3.0, 3.2.1
>
> Attachments: YARN-9194_1.patch, YARN-9194_2.patch, YARN-9194_3.patch, 
> YARN-9194_4.patch, YARN-9194_5.patch, YARN-9194_6.patch, 
> hadoop-hires-resourcemanager-hadoop11.log
>
>
> While the attempt fails, the REGISTERED comes, hence the 
> InvalidStateTransitionException happens.
>  
> {code:java}
> 2019-01-13 00:41:57,127 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> App attempt: appattempt_1547311267249_0001_02 can't handle this event at 
> current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> REGISTERED at FAILED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:913)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:1073)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:1054)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9194) Invalid event: REGISTERED and LAUNCH_FAILED at FAILED, and NullPointerException happens in RM while shutdown a NM

2019-01-21 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-9194:
-
Fix Version/s: (was: 3.1.3)
   3.1.2

> Invalid event: REGISTERED and LAUNCH_FAILED at FAILED, and 
> NullPointerException happens in RM while shutdown a NM
> -
>
> Key: YARN-9194
> URL: https://issues.apache.org/jira/browse/YARN-9194
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: lujie
>Assignee: lujie
>Priority: Critical
> Fix For: 3.1.2, 3.3.0, 3.2.1
>
> Attachments: YARN-9194_1.patch, YARN-9194_2.patch, YARN-9194_3.patch, 
> YARN-9194_4.patch, YARN-9194_5.patch, YARN-9194_6.patch, 
> hadoop-hires-resourcemanager-hadoop11.log
>
>
> While the attempt fails, the REGISTERED comes, hence the 
> InvalidStateTransitionException happens.
>  
> {code:java}
> 2019-01-13 00:41:57,127 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> App attempt: appattempt_1547311267249_0001_02 can't handle this event at 
> current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> REGISTERED at FAILED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:913)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:1073)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:1054)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration

2019-01-21 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748108#comment-16748108
 ] 

Wangda Tan commented on YARN-8747:
--

Cherry-picked to branch-3.1.2 as well. Updated fix version

> [UI2] YARN UI2 page loading failed due to js error under some time zone 
> configuration
> -
>
> Key: YARN-8747
> URL: https://issues.apache.org/jira/browse/YARN-8747
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 3.1.1
>Reporter: collinma
>Assignee: collinma
>Priority: Critical
> Fix For: 2.10.0, 3.0.4, 3.1.2, 3.3.0, 3.2.1, 2.9.3
>
> Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured 
> as GMT+8,  the web browser time zone is GMT+8 too. yarn ui page loaded failed 
> due to js error:
>  
> !image-2018-09-05-18-54-03-991.png!
> The moment-timezone js component raised that error. This has been fixed in 
> moment-timezone 
> v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need 
> to update moment-timezone version accordingly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9173) FairShare calculation broken for large values after YARN-8833

2019-01-21 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-9173:
-
Fix Version/s: (was: 3.1.3)
   3.1.2

> FairShare calculation broken for large values after YARN-8833
> -
>
> Key: YARN-9173
> URL: https://issues.apache.org/jira/browse/YARN-9173
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.3.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1
>
> Attachments: YARN-9137-branch-3.1.001.patch, 
> YARN-9137-branch3.1.001.patch, YARN-9173.001.patch, YARN-9173.002.patch
>
>
> After the fix for the infinite loop in YARN-8833 we now get the wrong values 
> back for fairshare calculations under certain circumstances. The current 
> implementation works when the total resource is smaller than Integer.MAXVALUE.
> When the total resource goes above that value the number of iterations is not 
> enough to converge to the correct value.
> The new test {{testResourceUsedWithWeightToResourceRatio()}} only checks that 
> the calculation does not hang but does not check the outcome of the 
> calculation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9173) FairShare calculation broken for large values after YARN-8833

2019-01-21 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748107#comment-16748107
 ] 

Wangda Tan commented on YARN-9173:
--

Cherry-picked to branch-3.1.2 as well. Updated fix version

> FairShare calculation broken for large values after YARN-8833
> -
>
> Key: YARN-9173
> URL: https://issues.apache.org/jira/browse/YARN-9173
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.3.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1
>
> Attachments: YARN-9137-branch-3.1.001.patch, 
> YARN-9137-branch3.1.001.patch, YARN-9173.001.patch, YARN-9173.002.patch
>
>
> After the fix for the infinite loop in YARN-8833 we now get the wrong values 
> back for fairshare calculations under certain circumstances. The current 
> implementation works when the total resource is smaller than Integer.MAXVALUE.
> When the total resource goes above that value the number of iterations is not 
> enough to converge to the correct value.
> The new test {{testResourceUsedWithWeightToResourceRatio()}} only checks that 
> the calculation does not hang but does not check the outcome of the 
> calculation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9205) When using custom resource type, application will fail to run due to the CapacityScheduler throws InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION)

2019-01-21 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-9205:
-
Target Version/s: 3.1.2, 3.2.1  (was: 3.1.2)

> When using custom resource type, application will fail to run due to the 
> CapacityScheduler throws 
> InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION) 
> ---
>
> Key: YARN-9205
> URL: https://issues.apache.org/jira/browse/YARN-9205
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Critical
> Attachments: YARN-9205-trunk.001.patch, YARN-9205-trunk.002.patch
>
>
> In a non-secure cluster. Reproduce it as follows:
>  # Set capacity scheduler in yarn-site.xml
>  # Use default capacity-scheduler.xml
>  # Set custom resource type "cmp.com/hdw" in resource-types.xml
>  # Set a value say 10 in node-resources.xml
>  # Start cluster
>  # Submit a distribute shell application which requests some "cmp.com/hdw"
> The AM will get an exception from CapacityScheduler and then failed. This bug 
> doesn't exist in FairScheduler.
> {code:java}
> 2019-01-17 22:12:11,286 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[ 2>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution Type: 
> GUARANTEED, Enforce Execution Type: false}]Resource Profile[]
> 2019-01-17 22:12:12,326 ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request! Cannot allocate containers as requested resource is greater 
> than maximum allowed allocation. Requested resource type=[cmp.com/hdw], 
> Requested resource=, maximum allowed 
> allocation=, please note that maximum allowed 
> allocation is calculated by scheduler based on maximum resource of registered 
> NodeManagers, which might be less than configured maximum 
> allocation=
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:492)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:388)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:315)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:293)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:301)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:240)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> ...{code}
> Did a roughly debugging, below method return the wrong maximum capacity.
> DefaultAMSProcessor.java, Line 234.
> {code:java}
> Resource maximumCapacity =
>  getScheduler().getMaximumResourceCapability(app.getQueue());{code}
> The above code seems should return "" 
> but returns "".
> This incorrect value might be caused by queue maximum allocation calculation 
> involved in YARN-8720:
> AbstractCSQueue.java Line364
> {code:java}
> this.maximumAllocation =
>  configuration.getMaximumAllocationPerQueue(
>  getQueuePath());{code}
> And this invokes CapacitySchedulerConfiguration.java Line 895:
> {code:java}
> Resource clusterMax = ResourceUtils.fetchMaximumAllocationFromConfig(this);
> {code}
> Passing a "this" which is not a YarnConfiguration instance will cause below 
> code return null for resource names and then only contains mandatory 
> resources. This might be the root cause.
> {code:java}
> private static Map 
> getResourceInformationMapFromConfig(
> ...
> // NULL value here!
> String[] resourceNames = conf.getStrings(YarnConfiguration.RESOURCE_TYPES);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional 

[jira] [Commented] (YARN-9205) When using custom resource type, application will fail to run due to the CapacityScheduler throws InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION)

2019-01-21 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748090#comment-16748090
 ] 

Wangda Tan commented on YARN-9205:
--

Gotcha, 

Thanks [~tangzhankun], then ver.2 patch looks good, could u also provide tests 
to avoid future regression? Since 3.1.2 is delayed, I want to include this in 
3.1.2. It gonna be great if we can get patch committed by tomorrow.

> When using custom resource type, application will fail to run due to the 
> CapacityScheduler throws 
> InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION) 
> ---
>
> Key: YARN-9205
> URL: https://issues.apache.org/jira/browse/YARN-9205
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Critical
> Attachments: YARN-9205-trunk.001.patch, YARN-9205-trunk.002.patch
>
>
> In a non-secure cluster. Reproduce it as follows:
>  # Set capacity scheduler in yarn-site.xml
>  # Use default capacity-scheduler.xml
>  # Set custom resource type "cmp.com/hdw" in resource-types.xml
>  # Set a value say 10 in node-resources.xml
>  # Start cluster
>  # Submit a distribute shell application which requests some "cmp.com/hdw"
> The AM will get an exception from CapacityScheduler and then failed. This bug 
> doesn't exist in FairScheduler.
> {code:java}
> 2019-01-17 22:12:11,286 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[ 2>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution Type: 
> GUARANTEED, Enforce Execution Type: false}]Resource Profile[]
> 2019-01-17 22:12:12,326 ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request! Cannot allocate containers as requested resource is greater 
> than maximum allowed allocation. Requested resource type=[cmp.com/hdw], 
> Requested resource=, maximum allowed 
> allocation=, please note that maximum allowed 
> allocation is calculated by scheduler based on maximum resource of registered 
> NodeManagers, which might be less than configured maximum 
> allocation=
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:492)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:388)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:315)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:293)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:301)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:240)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> ...{code}
> Did a roughly debugging, below method return the wrong maximum capacity.
> DefaultAMSProcessor.java, Line 234.
> {code:java}
> Resource maximumCapacity =
>  getScheduler().getMaximumResourceCapability(app.getQueue());{code}
> The above code seems should return "" 
> but returns "".
> This incorrect value might be caused by queue maximum allocation calculation 
> involved in YARN-8720:
> AbstractCSQueue.java Line364
> {code:java}
> this.maximumAllocation =
>  configuration.getMaximumAllocationPerQueue(
>  getQueuePath());{code}
> And this invokes CapacitySchedulerConfiguration.java Line 895:
> {code:java}
> Resource clusterMax = ResourceUtils.fetchMaximumAllocationFromConfig(this);
> {code}
> Passing a "this" which is not a YarnConfiguration instance will cause below 
> code return null for resource names and then only contains mandatory 
> resources. This might be the root cause.
> {code:java}
> private static Map 
> getResourceInformationMapFromConfig(
> ...
> // NULL value here!
> String[] resourceNames = conf.getStrings(YarnConfiguration.RESOURCE_TYPES);

[jira] [Updated] (YARN-9205) When using custom resource type, application will fail to run due to the CapacityScheduler throws InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION)

2019-01-21 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-9205:
-
Target Version/s: 3.1.2

> When using custom resource type, application will fail to run due to the 
> CapacityScheduler throws 
> InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION) 
> ---
>
> Key: YARN-9205
> URL: https://issues.apache.org/jira/browse/YARN-9205
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Critical
> Attachments: YARN-9205-trunk.001.patch, YARN-9205-trunk.002.patch
>
>
> In a non-secure cluster. Reproduce it as follows:
>  # Set capacity scheduler in yarn-site.xml
>  # Use default capacity-scheduler.xml
>  # Set custom resource type "cmp.com/hdw" in resource-types.xml
>  # Set a value say 10 in node-resources.xml
>  # Start cluster
>  # Submit a distribute shell application which requests some "cmp.com/hdw"
> The AM will get an exception from CapacityScheduler and then failed. This bug 
> doesn't exist in FairScheduler.
> {code:java}
> 2019-01-17 22:12:11,286 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[ 2>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution Type: 
> GUARANTEED, Enforce Execution Type: false}]Resource Profile[]
> 2019-01-17 22:12:12,326 ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request! Cannot allocate containers as requested resource is greater 
> than maximum allowed allocation. Requested resource type=[cmp.com/hdw], 
> Requested resource=, maximum allowed 
> allocation=, please note that maximum allowed 
> allocation is calculated by scheduler based on maximum resource of registered 
> NodeManagers, which might be less than configured maximum 
> allocation=
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:492)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:388)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:315)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:293)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:301)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:240)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> ...{code}
> Did a roughly debugging, below method return the wrong maximum capacity.
> DefaultAMSProcessor.java, Line 234.
> {code:java}
> Resource maximumCapacity =
>  getScheduler().getMaximumResourceCapability(app.getQueue());{code}
> The above code seems should return "" 
> but returns "".
> This incorrect value might be caused by queue maximum allocation calculation 
> involved in YARN-8720:
> AbstractCSQueue.java Line364
> {code:java}
> this.maximumAllocation =
>  configuration.getMaximumAllocationPerQueue(
>  getQueuePath());{code}
> And this invokes CapacitySchedulerConfiguration.java Line 895:
> {code:java}
> Resource clusterMax = ResourceUtils.fetchMaximumAllocationFromConfig(this);
> {code}
> Passing a "this" which is not a YarnConfiguration instance will cause below 
> code return null for resource names and then only contains mandatory 
> resources. This might be the root cause.
> {code:java}
> private static Map 
> getResourceInformationMapFromConfig(
> ...
> // NULL value here!
> String[] resourceNames = conf.getStrings(YarnConfiguration.RESOURCE_TYPES);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Commented] (YARN-9139) Simplify initializer code of GpuDiscoverer

2019-01-21 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748080#comment-16748080
 ] 

Peter Bacsko commented on YARN-9139:


[~snemeth] please update the logic with the following: after setting 
{{pathOfGpuBinary}}, validate that this file actually exists (and executable). 
It can have a bogus value if this code path runs:
{noformat}
} else if (binaryPath.isDirectory()) {
  binaryPath = new File(binaryPath, DEFAULT_BINARY_NAME);
  LOG.warn("Specified path is a directory, use " + 
DEFAULT_BINARY_NAME
  + " under the directory, updated path-to-executable:"
  + binaryPath.getAbsolutePath());
}

pathOfGpuBinary = binaryPath.getAbsolutePath();
{noformat}

What if there is no {{nvidia-smi}} under {{binaryPath}}? We must check that and 
fail immediately and don't wait until {{Shell.execCommand()}} fails later with 
a probably more cryptic error message. 

> Simplify initializer code of GpuDiscoverer
> --
>
> Key: YARN-9139
> URL: https://issues.apache.org/jira/browse/YARN-9139
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-9139.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9118) Handle issues with parsing user defined GPU devices in GpuDiscoverer

2019-01-21 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748070#comment-16748070
 ] 

Peter Bacsko commented on YARN-9118:


There is this condition:
{noformat}
if (lastDiscoveredGpuInformation.getGpus() != null) {
{noformat}

Now, what if it's actually null? Could that happen? If so, is that an erroneous 
condition where we should fail? Or just log something?

> Handle issues with parsing user defined GPU devices in GpuDiscoverer
> 
>
> Key: YARN-9118
> URL: https://issues.apache.org/jira/browse/YARN-9118
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-9118.001.patch, YARN-9118.002.patch, 
> YARN-9118.003.patch, YARN-9118.004.patch
>
>
> getGpusUsableByYarn has the following issues: 
> - Duplicate GPU device definitions are not denied: This seems to be the 
> biggest issue as it could increase the number of devices on the node if the 
> device ID is defined 2 or more times.
> - An empty-string is accepted, it works like the user would not want to use 
> auto-discovery and haven't defined any GPU devices: This will result in an 
> empty device list, but the empty-string check is never explicitly there in 
> the code, so this behavior just coincidental.
> - Number validation does not happen on GPU device IDs (separated by commas)
> Many testcases are added as the coverage was already very low.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9121) Users of GpuDiscoverer.getInstance() are not possible to test as instance is a static field

2019-01-21 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748066#comment-16748066
 ] 

Peter Bacsko commented on YARN-9121:


LGTM +1 (non-binding)

Consider using {{Preconditions.checkNotNull()}} in the constructors.

> Users of GpuDiscoverer.getInstance() are not possible to test as instance is 
> a static field
> ---
>
> Key: YARN-9121
> URL: https://issues.apache.org/jira/browse/YARN-9121
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-9121.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9100) Add tests for GpuResourceAllocator and do minor code cleanup

2019-01-21 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748059#comment-16748059
 ] 

Peter Bacsko commented on YARN-9100:


Thanks for the improvements Szilard. Some thoughts:



1.
{noformat}
87  } catch (InterruptedException e) {
88  // On any interrupt, break the loop and continue execution.
89  break;{noformat}
At least log something in case of an {{InterruptedException.}} Also, in cases 
like this, restoring the interrupted flag with 
{{Thread.currentThread.interrupt()}} is desirable.

2. In {{logStatement()}} you log twice if TRACE is enabled (I guess?)

3. Use SLF4J as an API, not Commons Logging.

4. You don't log anything in case of a timeout.

5. You can define both {{check}} and {{nonNullCheck}} at the same time. There 
are two problems with this. First, ordinary {{check}} is not used in the code. 
Second, if both are used, then the result of {{nonNullCheck}} is simply ignored.

 
In general I feel that having the retry logic in a separate class a bit of an 
overengineering. It would be justified it the patch modified classes other than 
{{GpuResourceAllocator}}. But for only a single class, it looks like an 
overkill.
Also, check out this project, which might be good for us: 
https://github.com/rholder/guava-retrying

 

> Add tests for GpuResourceAllocator and do minor code cleanup
> 
>
> Key: YARN-9100
> URL: https://issues.apache.org/jira/browse/YARN-9100
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-9100.001.patch, YARN-9100.002.patch, 
> YARN-9100.003.patch
>
>
> Add tests for GpuResourceAllocator and do minor code cleanup
> - Improved log and exception messages
> - Added some new debug logs
> - Some methods are named like *Copy, these are returning copies of internal 
> data structures. The word "copy" is just a noise in their name, so they have 
> been renamed. Additionally, the copied data structures modified to be 
> immutable.
> - The waiting loop in method assignGpus were decoupled into a new class, 
> RetryCommand. 
> Some more words about the new class RetryCommand: 
> There are some similar waiting loops in the code in: AMRMClient, 
> AMRMClientAsync and even in GenericTestUtils (see waitFor method). 
> RetryCommand could be a future replacement of these duplicated code, as it 
> gives a solution to this waiting loop problem in a generic way.
> The only downside of the usage of RetryCommand in GpuResourceAllocator 
> (startGpuAssignmentLoop) is the ugly exception handling part, but that's 
> solely because how Java deals with checked exceptions vs. lambdas. If there's 
> a cleaner way to solve the exception handling, I'm open for any suggestions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9204) RM fails to start if absolute resource is specified for partition capacity in CS queues

2019-01-21 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747952#comment-16747952
 ] 

Hudson commented on YARN-9204:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15794 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15794/])
YARN-9204. RM fails to start if absolute resource is specified for (wwei: rev 
abde1e1f58d5b699e4b8e460cff68e154738169b)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java


>  RM fails to start if absolute resource is specified for partition capacity 
> in CS queues
> 
>
> Key: YARN-9204
> URL: https://issues.apache.org/jira/browse/YARN-9204
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.3
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Blocker
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9204.001.patch, YARN-9204.002.patch, 
> YARN-9204.003.patch, YARN-9204.004.patch, YARN-9204.005.patch, 
> YARN-9204.006.patch
>
>
> When I set *yarn.scheduler.capacity..capacity* and 
> *yarn.scheduler.capacity..accessible-node-labels..capacity*
>   to absolute resource value, staring RM fails, and throw following 
> exception, and after diving into relate code, I found the logic of checking  
> absolute resource value maybe wrong.
> {code:java}
> 2019-01-17 20:25:45,716 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> java.lang.NumberFormatException: For input string: "[memory=40960,vcore=48]"
> at 
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
> at sun.misc.FloatingDecimal.parseFloat(FloatingDecimal.java:122)
> at java.lang.Float.parseFloat(Float.java:451)
> at 
> org.apache.hadoop.conf.Configuration.getFloat(Configuration.java:1606)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.internalGetLabeledQueue
> Capacity(CapacitySchedulerConfiguration.java:655)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getLabeledQueueCapacity
> (CapacitySchedulerConfiguration.java:670)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadCapacitiesByLabelsFromConf(CSQueueUti
> ls.java:135)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadUpdateAndCheckCapacities(CSQueueUtils
> .java:110)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupConfigurableCapacities(AbstractCS
> Queue.java:179)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java
> :356)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java
> :323)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setupQueueConfigs(ParentQueue.java:130)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.(ParentQueue.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySched
> ulerQueueManager.java:275)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.initializeQueues(Capacit
> ySchedulerQueueManager.java:158)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.j
> ava:715)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java
> :360)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:4
> 25)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:817)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> 

[jira] [Commented] (YARN-9214) Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code

2019-01-21 Thread Wanqiang Ji (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747949#comment-16747949
 ] 

Wanqiang Ji commented on YARN-9214:
---

Hi, [~cheersyang] can you help to review?

> Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code 
> --
>
> Key: YARN-9214
> URL: https://issues.apache.org/jira/browse/YARN-9214
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.1.0, 3.2.0, 2.9.2, 3.0.3, 2.8.5
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9214.001.patch
>
>
> *AbstractYarnScheduler#moveAllApps* and 
> *AbstractYarnScheduler#killAllAppsInQueue* had the same code segment. So I 
> think we need a method to handle it named 
> *AbstractYarnScheduler#getValidQueues*. Apart from this we need add the doc 
> comment to expound why exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9204) RM fails to start if absolute resource is specified for partition capacity in CS queues

2019-01-21 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-9204:
--
Fix Version/s: 3.1.3

>  RM fails to start if absolute resource is specified for partition capacity 
> in CS queues
> 
>
> Key: YARN-9204
> URL: https://issues.apache.org/jira/browse/YARN-9204
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.3
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Blocker
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9204.001.patch, YARN-9204.002.patch, 
> YARN-9204.003.patch, YARN-9204.004.patch, YARN-9204.005.patch, 
> YARN-9204.006.patch
>
>
> When I set *yarn.scheduler.capacity..capacity* and 
> *yarn.scheduler.capacity..accessible-node-labels..capacity*
>   to absolute resource value, staring RM fails, and throw following 
> exception, and after diving into relate code, I found the logic of checking  
> absolute resource value maybe wrong.
> {code:java}
> 2019-01-17 20:25:45,716 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> java.lang.NumberFormatException: For input string: "[memory=40960,vcore=48]"
> at 
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
> at sun.misc.FloatingDecimal.parseFloat(FloatingDecimal.java:122)
> at java.lang.Float.parseFloat(Float.java:451)
> at 
> org.apache.hadoop.conf.Configuration.getFloat(Configuration.java:1606)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.internalGetLabeledQueue
> Capacity(CapacitySchedulerConfiguration.java:655)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getLabeledQueueCapacity
> (CapacitySchedulerConfiguration.java:670)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadCapacitiesByLabelsFromConf(CSQueueUti
> ls.java:135)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadUpdateAndCheckCapacities(CSQueueUtils
> .java:110)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupConfigurableCapacities(AbstractCS
> Queue.java:179)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java
> :356)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java
> :323)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setupQueueConfigs(ParentQueue.java:130)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.(ParentQueue.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySched
> ulerQueueManager.java:275)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.initializeQueues(Capacit
> ySchedulerQueueManager.java:158)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.j
> ava:715)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java
> :360)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:4
> 25)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:817)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1218)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:317)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1500)
> 2019-01-17 20:25:45,719 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG:
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (YARN-9204) RM fails to start if absolute resource is specified for partition capacity in CS queues

2019-01-21 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-9204:
--
Fix Version/s: 3.2.1

>  RM fails to start if absolute resource is specified for partition capacity 
> in CS queues
> 
>
> Key: YARN-9204
> URL: https://issues.apache.org/jira/browse/YARN-9204
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.3
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Blocker
> Fix For: 3.3.0, 3.2.1
>
> Attachments: YARN-9204.001.patch, YARN-9204.002.patch, 
> YARN-9204.003.patch, YARN-9204.004.patch, YARN-9204.005.patch, 
> YARN-9204.006.patch
>
>
> When I set *yarn.scheduler.capacity..capacity* and 
> *yarn.scheduler.capacity..accessible-node-labels..capacity*
>   to absolute resource value, staring RM fails, and throw following 
> exception, and after diving into relate code, I found the logic of checking  
> absolute resource value maybe wrong.
> {code:java}
> 2019-01-17 20:25:45,716 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> java.lang.NumberFormatException: For input string: "[memory=40960,vcore=48]"
> at 
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
> at sun.misc.FloatingDecimal.parseFloat(FloatingDecimal.java:122)
> at java.lang.Float.parseFloat(Float.java:451)
> at 
> org.apache.hadoop.conf.Configuration.getFloat(Configuration.java:1606)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.internalGetLabeledQueue
> Capacity(CapacitySchedulerConfiguration.java:655)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getLabeledQueueCapacity
> (CapacitySchedulerConfiguration.java:670)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadCapacitiesByLabelsFromConf(CSQueueUti
> ls.java:135)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadUpdateAndCheckCapacities(CSQueueUtils
> .java:110)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupConfigurableCapacities(AbstractCS
> Queue.java:179)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java
> :356)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java
> :323)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setupQueueConfigs(ParentQueue.java:130)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.(ParentQueue.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySched
> ulerQueueManager.java:275)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.initializeQueues(Capacit
> ySchedulerQueueManager.java:158)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.j
> ava:715)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java
> :360)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:4
> 25)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:817)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1218)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:317)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1500)
> 2019-01-17 20:25:45,719 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG:
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (YARN-9215) RM throws NPE and shutdown when trying to stop a service

2019-01-21 Thread Akhil PB (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akhil PB updated YARN-9215:
---
Summary: RM throws NPE and shutdown when trying to stop a service  (was: RM 
throws NPE when trying to stop a service)

> RM throws NPE and shutdown when trying to stop a service
> 
>
> Key: YARN-9215
> URL: https://issues.apache.org/jira/browse/YARN-9215
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, yarn-native-services
>Affects Versions: 3.3.0
>Reporter: Akhil PB
>Priority: Critical
>
> When trying to stop the service from UI2, RM shutsdown and throws NPE.
> {code:java}
> 2019-01-21 16:22:13,548 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Type-specific 
> cleanup of application application_1548064352792_0002 of type yarn-service 
> succeeded
> 2019-01-21 16:22:13,549 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
> application_1548064352792_0002 State change from FINISHING to FINISHED on 
> event = ATTEMPT_FINISHED
> 2019-01-21 16:22:13,549 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Application Attempt appattempt_1548064352792_0002_01 is done. 
> finalState=FINISHED
> 2019-01-21 16:22:13,549 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dr.who   
> OPERATION=Application Finished - Succeeded  TARGET=RMAppManager 
> RESULT=SUCCESS  APPID=application_1548064352792_0002
> 2019-01-21 16:22:13,549 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_1548064352792_0002_01_03 Container Transitioned from RUNNING to 
> KILLED
> 2019-01-21 16:22:13,549 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Cleaning 
> master appattempt_1548064352792_0002_01
> 2019-01-21 16:22:13,549 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dr.who   
> OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
> APPID=application_1548064352792_0002
> CONTAINERID=container_1548064352792_0002_01_03  
> RESOURCE=QUEUENAME=default
> 2019-01-21 16:22:13,549 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:462)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.access$100(TimelineServiceV2Publisher.java:73)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:497)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:486)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> cc [~sunilg] [~rohithsharma]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9215) RM throws NPE when trying to stop a service

2019-01-21 Thread Akhil PB (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akhil PB updated YARN-9215:
---
Component/s: yarn-native-services

> RM throws NPE when trying to stop a service
> ---
>
> Key: YARN-9215
> URL: https://issues.apache.org/jira/browse/YARN-9215
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, yarn-native-services, yarn-ui-v2
>Affects Versions: 3.3.0
>Reporter: Akhil PB
>Priority: Critical
>
> When trying to stop the service from UI2, RM shutsdown and throws NPE.
> {code:java}
> 2019-01-21 16:22:13,548 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Type-specific 
> cleanup of application application_1548064352792_0002 of type yarn-service 
> succeeded
> 2019-01-21 16:22:13,549 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
> application_1548064352792_0002 State change from FINISHING to FINISHED on 
> event = ATTEMPT_FINISHED
> 2019-01-21 16:22:13,549 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Application Attempt appattempt_1548064352792_0002_01 is done. 
> finalState=FINISHED
> 2019-01-21 16:22:13,549 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dr.who   
> OPERATION=Application Finished - Succeeded  TARGET=RMAppManager 
> RESULT=SUCCESS  APPID=application_1548064352792_0002
> 2019-01-21 16:22:13,549 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_1548064352792_0002_01_03 Container Transitioned from RUNNING to 
> KILLED
> 2019-01-21 16:22:13,549 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Cleaning 
> master appattempt_1548064352792_0002_01
> 2019-01-21 16:22:13,549 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dr.who   
> OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
> APPID=application_1548064352792_0002
> CONTAINERID=container_1548064352792_0002_01_03  
> RESOURCE=QUEUENAME=default
> 2019-01-21 16:22:13,549 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:462)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.access$100(TimelineServiceV2Publisher.java:73)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:497)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:486)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> cc [~sunilg] [~rohithsharma]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9215) RM throws NPE when trying to stop a service

2019-01-21 Thread Akhil PB (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akhil PB updated YARN-9215:
---
Component/s: (was: yarn-ui-v2)

> RM throws NPE when trying to stop a service
> ---
>
> Key: YARN-9215
> URL: https://issues.apache.org/jira/browse/YARN-9215
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, yarn-native-services
>Affects Versions: 3.3.0
>Reporter: Akhil PB
>Priority: Critical
>
> When trying to stop the service from UI2, RM shutsdown and throws NPE.
> {code:java}
> 2019-01-21 16:22:13,548 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Type-specific 
> cleanup of application application_1548064352792_0002 of type yarn-service 
> succeeded
> 2019-01-21 16:22:13,549 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
> application_1548064352792_0002 State change from FINISHING to FINISHED on 
> event = ATTEMPT_FINISHED
> 2019-01-21 16:22:13,549 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Application Attempt appattempt_1548064352792_0002_01 is done. 
> finalState=FINISHED
> 2019-01-21 16:22:13,549 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dr.who   
> OPERATION=Application Finished - Succeeded  TARGET=RMAppManager 
> RESULT=SUCCESS  APPID=application_1548064352792_0002
> 2019-01-21 16:22:13,549 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_1548064352792_0002_01_03 Container Transitioned from RUNNING to 
> KILLED
> 2019-01-21 16:22:13,549 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Cleaning 
> master appattempt_1548064352792_0002_01
> 2019-01-21 16:22:13,549 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dr.who   
> OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
> APPID=application_1548064352792_0002
> CONTAINERID=container_1548064352792_0002_01_03  
> RESOURCE=QUEUENAME=default
> 2019-01-21 16:22:13,549 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:462)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.access$100(TimelineServiceV2Publisher.java:73)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:497)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:486)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> cc [~sunilg] [~rohithsharma]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9215) RM throws NPE when trying to stop a service

2019-01-21 Thread Akhil PB (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akhil PB updated YARN-9215:
---
Description: 
When trying to stop the service from UI2, RM shutsdown and throws NPE.

{code:java}
2019-01-21 16:22:13,548 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Type-specific 
cleanup of application application_1548064352792_0002 of type yarn-service 
succeeded
2019-01-21 16:22:13,549 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1548064352792_0002 State change from FINISHING to FINISHED on event 
= ATTEMPT_FINISHED
2019-01-21 16:22:13,549 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Application Attempt appattempt_1548064352792_0002_01 is done. 
finalState=FINISHED
2019-01-21 16:22:13,549 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dr.who   
OPERATION=Application Finished - Succeeded  TARGET=RMAppManager 
RESULT=SUCCESS  APPID=application_1548064352792_0002
2019-01-21 16:22:13,549 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1548064352792_0002_01_03 Container Transitioned from RUNNING to 
KILLED
2019-01-21 16:22:13,549 INFO 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Cleaning 
master appattempt_1548064352792_0002_01
2019-01-21 16:22:13,549 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dr.who   
OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
APPID=application_1548064352792_0002
CONTAINERID=container_1548064352792_0002_01_03  RESOURCE=QUEUENAME=default
2019-01-21 16:22:13,549 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
Error in dispatcher thread
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:462)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.access$100(TimelineServiceV2Publisher.java:73)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:497)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:486)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
at java.lang.Thread.run(Thread.java:748)
{code}

cc [~sunilg] [~rohithsharma]

  was:
When trying to stop the service from UI2, RM shutsdown and throws NPE.

{code:java}
2019-01-21 16:22:13,548 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Type-specific 
cleanup of application application_1548064352792_0002 of type yarn-service 
succeeded
2019-01-21 16:22:13,549 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1548064352792_0002 State change from FINISHING to FINISHED on event 
= ATTEMPT_FINISHED
2019-01-21 16:22:13,549 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Application Attempt appattempt_1548064352792_0002_01 is done. 
finalState=FINISHED
2019-01-21 16:22:13,549 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dr.who   
OPERATION=Application Finished - Succeeded  TARGET=RMAppManager 
RESULT=SUCCESS  APPID=application_1548064352792_0002
2019-01-21 16:22:13,549 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1548064352792_0002_01_03 Container Transitioned from RUNNING to 
KILLED
2019-01-21 16:22:13,549 INFO 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Cleaning 
master appattempt_1548064352792_0002_01
2019-01-21 16:22:13,549 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dr.who   
OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
APPID=application_1548064352792_0002
CONTAINERID=container_1548064352792_0002_01_03  RESOURCE=QUEUENAME=default
2019-01-21 16:22:13,549 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
Error in dispatcher thread
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:462)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.access$100(TimelineServiceV2Publisher.java:73)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:497)
at 

[jira] [Created] (YARN-9215) RM throws NPE when trying to stop a service

2019-01-21 Thread Akhil PB (JIRA)
Akhil PB created YARN-9215:
--

 Summary: RM throws NPE when trying to stop a service
 Key: YARN-9215
 URL: https://issues.apache.org/jira/browse/YARN-9215
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, yarn-ui-v2
Affects Versions: 3.3.0
Reporter: Akhil PB


When trying to stop the service from UI2, RM shutsdown and throws NPE.

{code:java}
2019-01-21 16:22:13,548 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Type-specific 
cleanup of application application_1548064352792_0002 of type yarn-service 
succeeded
2019-01-21 16:22:13,549 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1548064352792_0002 State change from FINISHING to FINISHED on event 
= ATTEMPT_FINISHED
2019-01-21 16:22:13,549 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Application Attempt appattempt_1548064352792_0002_01 is done. 
finalState=FINISHED
2019-01-21 16:22:13,549 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dr.who   
OPERATION=Application Finished - Succeeded  TARGET=RMAppManager 
RESULT=SUCCESS  APPID=application_1548064352792_0002
2019-01-21 16:22:13,549 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1548064352792_0002_01_03 Container Transitioned from RUNNING to 
KILLED
2019-01-21 16:22:13,549 INFO 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Cleaning 
master appattempt_1548064352792_0002_01
2019-01-21 16:22:13,549 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dr.who   
OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
APPID=application_1548064352792_0002
CONTAINERID=container_1548064352792_0002_01_03  RESOURCE=QUEUENAME=default
2019-01-21 16:22:13,549 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
Error in dispatcher thread
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:462)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.access$100(TimelineServiceV2Publisher.java:73)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:497)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:486)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8711) YARN UI2 : Display component state in Component list and details page for a Service

2019-01-21 Thread Akhil PB (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747833#comment-16747833
 ] 

Akhil PB commented on YARN-8711:


Hi [~suma.shivaprasad], Could you please provide which API gives the state of 
the components?

We are displaying the service state in the service info page. But the states 
are not available for the service components from the API itself.

For example: The component API
{{http://localhost:8198/ws/v2/timeline/clusters/yarn-cluster/apps/application_1548064352792_0002/entities/COMPONENT?fields=ALL=COMPONENT&_=1548065347971}}
 gives 
{code}
[{
"metrics": [{
"type": "SINGLE_VALUE",
"id": "ContainersDesired",
"aggregationOp": "NOP",
"values": {
"1548066418807": 2
}
}, {
"type": "SINGLE_VALUE",
"id": "ContainersFailed",
"aggregationOp": "NOP",
"values": {
"1548066418807": 0
}
}, {
"type": "SINGLE_VALUE",
"id": "ContainersRequested",
"aggregationOp": "NOP",
"values": {
"1548066418807": 0
}
}, {
"type": "SINGLE_VALUE",
"id": "ContainersRunning",
"aggregationOp": "NOP",
"values": {
"1548066418807": 2
}
}, {
"type": "SINGLE_VALUE",
"id": "ContainersDiskFailure",
"aggregationOp": "NOP",
"values": {
"1548066418807": 0
}
}, {
"type": "SINGLE_VALUE",
"id": "ContainersReady",
"aggregationOp": "NOP",
"values": {
"1548066418807": 0
}
}, {
"type": "SINGLE_VALUE",
"id": "SurplusContainers",
"aggregationOp": "NOP",
"values": {
"1548066418807": 1
}
}, {
"type": "SINGLE_VALUE",
"id": "ContainersSucceeded",
"aggregationOp": "NOP",
"values": {
"1548066418807": 0
}
}, {
"type": "SINGLE_VALUE",
"id": "ContainersPreempted",
"aggregationOp": "NOP",
"values": {
"1548066418807": 0
}
}],
"events": [],
"createdtime": 1548066389875,
"idprefix": 0,
"id": "sleeper",
"type": "COMPONENT",
"info": {
"LAUNCH_COMMAND": "sleep 90",
"UID": 
"yarn-cluster!application_1548064352792_0002!COMPONENT!0!sleeper",
"RESOURCE_MEMORY": "256",
"RUN_PRIVILEGED_CONTAINER": "false",
"RESOURCE_CPU": 1,
"FROM_ID": 
"yarn-cluster!dr.who!sleeper-service-4!1548066383571!application_1548064352792_0002!COMPONENT!0!sleeper"
},
"configs": {},
"isrelatedto": {},
"relatesto": {}
}]
{code}

There is no state info available in the info object.
And component instances are displayed with states in the table.

cc [~sunilg]

> YARN UI2 : Display component state in Component list and details page for a 
> Service
> ---
>
> Key: YARN-8711
> URL: https://issues.apache.org/jira/browse/YARN-8711
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Suma Shivaprasad
>Assignee: Akhil PB
>Priority: Major
>
> YARN-8488 adds component states and service state = SUCCEEDED. Users could 
> then track overall component status on the UI for terminating jobs. 
> cc [~sunil.gov...@gmail.com]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable

2019-01-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747815#comment-16747815
 ] 

Hadoop QA commented on YARN-6929:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
35s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 47s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
13s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
33s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 
0 new + 418 unchanged - 12 fixed = 418 total (was 430) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 19s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
47s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
25s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
33s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}118m 46s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-6929 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12955612/YARN-6929.6.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 25895ab56f98 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 
5 08:56:16 UTC 2018 x86_64 x86_64 

[jira] [Updated] (YARN-9204) RM fails to start if absolute resource is specified for partition capacity in CS queues

2019-01-21 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-9204:
--
Summary:  RM fails to start if absolute resource is specified for partition 
capacity in CS queues  (was:  
yarn.scheduler.capacity..accessible-node-labels..capacity
 can not support absolute resource value)

>  RM fails to start if absolute resource is specified for partition capacity 
> in CS queues
> 
>
> Key: YARN-9204
> URL: https://issues.apache.org/jira/browse/YARN-9204
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.3
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Blocker
> Attachments: YARN-9204.001.patch, YARN-9204.002.patch, 
> YARN-9204.003.patch, YARN-9204.004.patch, YARN-9204.005.patch, 
> YARN-9204.006.patch
>
>
> When I set *yarn.scheduler.capacity..capacity* and 
> *yarn.scheduler.capacity..accessible-node-labels..capacity*
>   to absolute resource value, staring RM fails, and throw following 
> exception, and after diving into relate code, I found the logic of checking  
> absolute resource value maybe wrong.
> {code:java}
> 2019-01-17 20:25:45,716 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> java.lang.NumberFormatException: For input string: "[memory=40960,vcore=48]"
> at 
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
> at sun.misc.FloatingDecimal.parseFloat(FloatingDecimal.java:122)
> at java.lang.Float.parseFloat(Float.java:451)
> at 
> org.apache.hadoop.conf.Configuration.getFloat(Configuration.java:1606)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.internalGetLabeledQueue
> Capacity(CapacitySchedulerConfiguration.java:655)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getLabeledQueueCapacity
> (CapacitySchedulerConfiguration.java:670)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadCapacitiesByLabelsFromConf(CSQueueUti
> ls.java:135)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadUpdateAndCheckCapacities(CSQueueUtils
> .java:110)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupConfigurableCapacities(AbstractCS
> Queue.java:179)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java
> :356)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java
> :323)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setupQueueConfigs(ParentQueue.java:130)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.(ParentQueue.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySched
> ulerQueueManager.java:275)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.initializeQueues(Capacit
> ySchedulerQueueManager.java:158)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.j
> ava:715)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java
> :360)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:4
> 25)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:817)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1218)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:317)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1500)
> 2019-01-17 20:25:45,719 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: 

[jira] [Commented] (YARN-9204) yarn.scheduler.capacity..accessible-node-labels..capacity can not support absolute resource value

2019-01-21 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747756#comment-16747756
 ] 

Weiwei Yang commented on YARN-9204:
---

LGTM, +1, committing now.

>  
> yarn.scheduler.capacity..accessible-node-labels..capacity
>  can not support absolute resource value
> --
>
> Key: YARN-9204
> URL: https://issues.apache.org/jira/browse/YARN-9204
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.3
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Blocker
> Attachments: YARN-9204.001.patch, YARN-9204.002.patch, 
> YARN-9204.003.patch, YARN-9204.004.patch, YARN-9204.005.patch, 
> YARN-9204.006.patch
>
>
> When I set *yarn.scheduler.capacity..capacity* and 
> *yarn.scheduler.capacity..accessible-node-labels..capacity*
>   to absolute resource value, staring RM fails, and throw following 
> exception, and after diving into relate code, I found the logic of checking  
> absolute resource value maybe wrong.
> {code:java}
> 2019-01-17 20:25:45,716 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> java.lang.NumberFormatException: For input string: "[memory=40960,vcore=48]"
> at 
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
> at sun.misc.FloatingDecimal.parseFloat(FloatingDecimal.java:122)
> at java.lang.Float.parseFloat(Float.java:451)
> at 
> org.apache.hadoop.conf.Configuration.getFloat(Configuration.java:1606)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.internalGetLabeledQueue
> Capacity(CapacitySchedulerConfiguration.java:655)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getLabeledQueueCapacity
> (CapacitySchedulerConfiguration.java:670)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadCapacitiesByLabelsFromConf(CSQueueUti
> ls.java:135)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadUpdateAndCheckCapacities(CSQueueUtils
> .java:110)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupConfigurableCapacities(AbstractCS
> Queue.java:179)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java
> :356)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java
> :323)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setupQueueConfigs(ParentQueue.java:130)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.(ParentQueue.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySched
> ulerQueueManager.java:275)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.initializeQueues(Capacit
> ySchedulerQueueManager.java:158)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.j
> ava:715)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java
> :360)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:4
> 25)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:817)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1218)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:317)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1500)
> 2019-01-17 20:25:45,719 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG:
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (YARN-9204) yarn.scheduler.capacity..accessible-node-labels..capacity can not support absolute resource value

2019-01-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747755#comment-16747755
 ] 

Hadoop QA commented on YARN-9204:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 16s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 35s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 91m  8s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}145m 15s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9204 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12955596/YARN-9204.006.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 43ac4890ba1c 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 27aa6e8 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/23118/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/23118/testReport/ |
| Max. process+thread count | 881 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 

[jira] [Commented] (YARN-9142) UI cluster nodes page is broken

2019-01-21 Thread Akhil PB (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747748#comment-16747748
 ] 

Akhil PB commented on YARN-9142:


Looks like this issue is caused by -YARN-9036.-
 cc [~cheersyang] [~rohithsharma] [~sunilg]

> UI cluster nodes page is broken
> ---
>
> Key: YARN-9142
> URL: https://issues.apache.org/jira/browse/YARN-9142
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Akhil PB
>Priority: Critical
> Attachments: ClusterNodePage.png, 
> cluster-nodes-page-hadoop-3.3.0-SNAPSHOT.png
>
>
> It is observed in trunk build YARN cluster node pages is broken even though 
> data exist. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable

2019-01-21 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-6929:

Attachment: YARN-6929.6.patch

> yarn.nodemanager.remote-app-log-dir structure is not scalable
> -
>
> Key: YARN-6929
> URL: https://issues.apache.org/jira/browse/YARN-6929
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-6929.1.patch, YARN-6929.2.patch, YARN-6929.2.patch, 
> YARN-6929.3.patch, YARN-6929.4.patch, YARN-6929.5.patch, YARN-6929.6.patch, 
> YARN-6929.patch
>
>
> The current directory structure for yarn.nodemanager.remote-app-log-dir is 
> not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). 
> With retention yarn.log-aggregation.retain-seconds of 7days, there are more 
> chances LogAggregationService fails to create a new directory with 
> FSLimitException$MaxDirectoryItemsExceededException.
> The current structure is 
> //logs/. This can be 
> improved with adding date as a subdirectory like 
> //logs// 
> {code}
> WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
>  Application failed to init aggregation 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
>  The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 
> items=1048576 
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813)
>  
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600)
>  
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>  
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:415) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) 
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
> at java.lang.Thread.run(Thread.java:745) 
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
>  The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 
> items=1048576 
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072)
>  
> at 
>