[jira] [Updated] (YARN-10438) Handle null containerId in ClientRMService#getContainerReport()

2022-02-24 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10438:
--
Description: 
Here is the Exception trace which we are seeing, we are suspecting because of 
this exception RM is reaching in a state where it is no more allowing any new 
job to run on the cluster.


{code:java}
2020-09-15 07:08:15,496 WARN ipc.Server: IPC Server handler 18 on default port 
8032, call Call#1463486 Retry#0 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getContainerReport from 
10.39.91.205:49564 java.lang.NullPointerException at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:520)
 at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getContainerReport(ApplicationClientProtocolPBServiceImpl.java:466)
 at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:639)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at 
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999) at 
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)
{code}


We are seeing this issue with this specific node only, we do run this cluster 
at a scale of around 500 nodes. 

  was:
Here is the Exception trace which we are seeing, we are suspecting because of 
this exception RM is reaching in a state where it is no more allowing any new 
job to run on the cluster.

{noformat}
2020-09-15 07:08:15,496 WARN ipc.Server: IPC Server handler 18 on default port 
8032, call Call#1463486 Retry#0 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getContainerReport from 
10.39.91.205:49564 java.lang.NullPointerException at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:520)
 at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getContainerReport(ApplicationClientProtocolPBServiceImpl.java:466)
 at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:639)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at 
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999) at 
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)
{noformat}

We are seeing this issue with this specific node only, we do run this cluster 
at a scale of around 500 nodes. 


> Handle null containerId in ClientRMService#getContainerReport()
> ---
>
> Key: YARN-10438
> URL: https://issues.apache.org/jira/browse/YARN-10438
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.2.1
>Reporter: Raghvendra Singh
>Assignee: Shubham Gupta
>Priority: Major
> Fix For: 3.4.0, 2.10.2, 3.2.3, 3.3.2
>
>
> Here is the Exception trace which we are seeing, we are suspecting because of 
> this exception RM is reaching in a state where it is no more allowing any new 
> job to run on the cluster.
> {code:java}
> 2020-09-15 07:08:15,496 WARN ipc.Server: IPC Server handler 18 on default 
> port 8032, call Call#1463486 Retry#0 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getContainerReport 
> from 10.39.91.205:49564 java.lang.NullPointerException at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:520)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getContainerReport(ApplicationClientProtocolPBServiceImpl.java:466)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:639)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999) at 
> 

[jira] [Updated] (YARN-10438) Handle null containerId in ClientRMService#getContainerReport()

2021-11-28 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-10438:
-
Fix Version/s: 2.10.2

Cherry-picked to branch-2.10.

> Handle null containerId in ClientRMService#getContainerReport()
> ---
>
> Key: YARN-10438
> URL: https://issues.apache.org/jira/browse/YARN-10438
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.2.1
>Reporter: Raghvendra Singh
>Assignee: Shubham Gupta
>Priority: Major
> Fix For: 3.4.0, 2.10.2, 3.2.3, 3.3.2
>
>
> Here is the Exception trace which we are seeing, we are suspecting because of 
> this exception RM is reaching in a state where it is no more allowing any new 
> job to run on the cluster.
> {noformat}
> 2020-09-15 07:08:15,496 WARN ipc.Server: IPC Server handler 18 on default 
> port 8032, call Call#1463486 Retry#0 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getContainerReport 
> from 10.39.91.205:49564 java.lang.NullPointerException at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:520)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getContainerReport(ApplicationClientProtocolPBServiceImpl.java:466)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:639)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)
> {noformat}
> We are seeing this issue with this specific node only, we do run this cluster 
> at a scale of around 500 nodes. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10438) Handle null containerId in ClientRMService#getContainerReport()

2021-11-18 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-10438:
-
Fix Version/s: 3.2.3
   3.3.2

> Handle null containerId in ClientRMService#getContainerReport()
> ---
>
> Key: YARN-10438
> URL: https://issues.apache.org/jira/browse/YARN-10438
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.2.1
>Reporter: Raghvendra Singh
>Assignee: Shubham Gupta
>Priority: Major
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>
> Here is the Exception trace which we are seeing, we are suspecting because of 
> this exception RM is reaching in a state where it is no more allowing any new 
> job to run on the cluster.
> {noformat}
> 2020-09-15 07:08:15,496 WARN ipc.Server: IPC Server handler 18 on default 
> port 8032, call Call#1463486 Retry#0 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getContainerReport 
> from 10.39.91.205:49564 java.lang.NullPointerException at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:520)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getContainerReport(ApplicationClientProtocolPBServiceImpl.java:466)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:639)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)
> {noformat}
> We are seeing this issue with this specific node only, we do run this cluster 
> at a scale of around 500 nodes. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10438) Handle null containerId in ClientRMService#getContainerReport()

2020-09-24 Thread Surendra Singh Lilhore (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated YARN-10438:
--
Summary: Handle null containerId in ClientRMService#getContainerReport()  
(was: NPE while fetching container report for a node which is not there in 
active/decommissioned/lost/unhealthy nodes on RM)

> Handle null containerId in ClientRMService#getContainerReport()
> ---
>
> Key: YARN-10438
> URL: https://issues.apache.org/jira/browse/YARN-10438
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.2.1
>Reporter: Raghvendra Singh
>Assignee: Shubham Gupta
>Priority: Major
>
> Here is the Exception trace which we are seeing, we are suspecting because of 
> this exception RM is reaching in a state where it is no more allowing any new 
> job to run on the cluster.
> {noformat}
> 2020-09-15 07:08:15,496 WARN ipc.Server: IPC Server handler 18 on default 
> port 8032, call Call#1463486 Retry#0 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getContainerReport 
> from 10.39.91.205:49564 java.lang.NullPointerException at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:520)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getContainerReport(ApplicationClientProtocolPBServiceImpl.java:466)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:639)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)
> {noformat}
> We are seeing this issue with this specific node only, we do run this cluster 
> at a scale of around 500 nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org