[jira] [Commented] (YARN-10438) NPE while fetching container report for a node which is not there in active/decommissioned/lost/unhealthy nodes on RM
[ https://issues.apache.org/jira/browse/YARN-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17197387#comment-17197387 ] Shubham Gupta commented on YARN-10438: -- [~raghvendra.s], can you share the code ClientRMService.java at line 520 in your class and the full function getContainerReport() in ClientRMService.java and ApplicationClientProtocolPBServiceImpl.java. > NPE while fetching container report for a node which is not there in > active/decommissioned/lost/unhealthy nodes on RM > - > > Key: YARN-10438 > URL: https://issues.apache.org/jira/browse/YARN-10438 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.2.1 >Reporter: Raghvendra Singh >Priority: Major > > Here is the Exception trace which we are seeing, we are suspecting because of > this exception RM is reaching in a state where it is no more allowing any new > job to run on the cluster. > {noformat} > 2020-09-15 07:08:15,496 WARN ipc.Server: IPC Server handler 18 on default > port 8032, call Call#1463486 Retry#0 > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getContainerReport > from 10.39.91.205:49564 java.lang.NullPointerException at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:520) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getContainerReport(ApplicationClientProtocolPBServiceImpl.java:466) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:639) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999) at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915) > {noformat} > We are seeing this issue with this specific node only, we do run this cluster > at a scale of around 500 nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10438) NPE while fetching container report for a node which is not there in active/decommissioned/lost/unhealthy nodes on RM
[ https://issues.apache.org/jira/browse/YARN-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17197176#comment-17197176 ] Shubham Gupta commented on YARN-10438: -- Hi [~raghvendra.s], Can you please again check the version number of Hadoop? > NPE while fetching container report for a node which is not there in > active/decommissioned/lost/unhealthy nodes on RM > - > > Key: YARN-10438 > URL: https://issues.apache.org/jira/browse/YARN-10438 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.2.1 >Reporter: Raghvendra Singh >Priority: Major > > Here is the Exception trace which we are seeing, we are suspecting because of > this exception RM is reaching in a state where it is no more allowing any new > job to run on the cluster. > {noformat} > 2020-09-15 07:08:15,496 WARN ipc.Server: IPC Server handler 18 on default > port 8032, call Call#1463486 Retry#0 > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getContainerReport > from 10.39.91.205:49564 java.lang.NullPointerException at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:520) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getContainerReport(ApplicationClientProtocolPBServiceImpl.java:466) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:639) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999) at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915) > {noformat} > We are seeing this issue with this specific node only, we do run this cluster > at a scale of around 500 nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9810) Add queue capacity/maxcapacity percentage metrics
[ https://issues.apache.org/jira/browse/YARN-9810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923776#comment-16923776 ] Shubham Gupta commented on YARN-9810: - Thanks for committing [~jhung] . > Add queue capacity/maxcapacity percentage metrics > - > > Key: YARN-9810 > URL: https://issues.apache.org/jira/browse/YARN-9810 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Shubham Gupta >Priority: Major > Labels: release-blocker > Fix For: 2.10.0, 3.3.0, 3.2.1, 3.1.4 > > Attachments: YARN-9810.01.patch, YARN-9810.02.patch > > > Similar to YARN-9085, it'd be good to have queue (absolute) capacity / > (absolute) max capacity metrics in CSQueueMetrics. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9810) Add queue capacity/maxcapacity percentage metrics
[ https://issues.apache.org/jira/browse/YARN-9810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Gupta updated YARN-9810: Attachment: (was: YARN-9801.01.patch) > Add queue capacity/maxcapacity percentage metrics > - > > Key: YARN-9810 > URL: https://issues.apache.org/jira/browse/YARN-9810 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Shubham Gupta >Priority: Major > Labels: release-blocker > Attachments: YARN-9810.01.patch, YARN-9810.02.patch > > > Similar to YARN-9085, it'd be good to have queue (absolute) capacity / > (absolute) max capacity metrics in CSQueueMetrics. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9810) Add queue capacity/maxcapacity percentage metrics
[ https://issues.apache.org/jira/browse/YARN-9810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923684#comment-16923684 ] Shubham Gupta commented on YARN-9810: - [~jhung] thanks for the review. > Add queue capacity/maxcapacity percentage metrics > - > > Key: YARN-9810 > URL: https://issues.apache.org/jira/browse/YARN-9810 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Shubham Gupta >Priority: Major > Labels: release-blocker > Attachments: YARN-9801.01.patch, YARN-9810.01.patch, > YARN-9810.02.patch > > > Similar to YARN-9085, it'd be good to have queue (absolute) capacity / > (absolute) max capacity metrics in CSQueueMetrics. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9810) Add queue capacity/maxcapacity percentage metrics
[ https://issues.apache.org/jira/browse/YARN-9810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922633#comment-16922633 ] Shubham Gupta commented on YARN-9810: - +1 > Add queue capacity/maxcapacity percentage metrics > - > > Key: YARN-9810 > URL: https://issues.apache.org/jira/browse/YARN-9810 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Priority: Major > > Similar to YARN-9085, it'd be good to have queue (absolute) capacity / > (absolute) max capacity metrics in CSQueueMetrics. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org