[jira] [Resolved] (YARN-10438) Handle null containerId in ClientRMService#getContainerReport()
[ https://issues.apache.org/jira/browse/YARN-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surendra Singh Lilhore resolved YARN-10438. --- Fix Version/s: 3.4.0 Resolution: Fixed Thanks [~shubhamod] for contribution. Committed to trunk ! > Handle null containerId in ClientRMService#getContainerReport() > --- > > Key: YARN-10438 > URL: https://issues.apache.org/jira/browse/YARN-10438 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.2.1 >Reporter: Raghvendra Singh >Assignee: Shubham Gupta >Priority: Major > Fix For: 3.4.0 > > > Here is the Exception trace which we are seeing, we are suspecting because of > this exception RM is reaching in a state where it is no more allowing any new > job to run on the cluster. > {noformat} > 2020-09-15 07:08:15,496 WARN ipc.Server: IPC Server handler 18 on default > port 8032, call Call#1463486 Retry#0 > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getContainerReport > from 10.39.91.205:49564 java.lang.NullPointerException at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:520) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getContainerReport(ApplicationClientProtocolPBServiceImpl.java:466) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:639) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999) at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915) > {noformat} > We are seeing this issue with this specific node only, we do run this cluster > at a scale of around 500 nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10438) Handle null containerId in ClientRMService#getContainerReport()
[ https://issues.apache.org/jira/browse/YARN-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surendra Singh Lilhore updated YARN-10438: -- Summary: Handle null containerId in ClientRMService#getContainerReport() (was: NPE while fetching container report for a node which is not there in active/decommissioned/lost/unhealthy nodes on RM) > Handle null containerId in ClientRMService#getContainerReport() > --- > > Key: YARN-10438 > URL: https://issues.apache.org/jira/browse/YARN-10438 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.2.1 >Reporter: Raghvendra Singh >Assignee: Shubham Gupta >Priority: Major > > Here is the Exception trace which we are seeing, we are suspecting because of > this exception RM is reaching in a state where it is no more allowing any new > job to run on the cluster. > {noformat} > 2020-09-15 07:08:15,496 WARN ipc.Server: IPC Server handler 18 on default > port 8032, call Call#1463486 Retry#0 > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getContainerReport > from 10.39.91.205:49564 java.lang.NullPointerException at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:520) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getContainerReport(ApplicationClientProtocolPBServiceImpl.java:466) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:639) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999) at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915) > {noformat} > We are seeing this issue with this specific node only, we do run this cluster > at a scale of around 500 nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9809) NMs should supply a health status when registering with RM
[ https://issues.apache.org/jira/browse/YARN-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201900#comment-17201900 ] Hadoop QA commented on YARN-9809: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 34s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 1s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:blue}0{color} | {color:blue} buf {color} | {color:blue} 0m 0s{color} | {color:blue}{color} | {color:blue} buf was not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 18 new or modified test files. {color} | || || || || {color:brown} branch-3.2 Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 26s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 27s{color} | {color:green}{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m 26s{color} | {color:green}{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 54s{color} | {color:green}{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 5m 15s{color} | {color:green}{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 27m 6s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 13s{color} | {color:green}{color} | {color:green} branch-3.2 passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 2m 16s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 10m 21s{color} | {color:green}{color} | {color:green} branch-3.2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 28s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 30s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m 9s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 20m 9s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 20m 9s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 50s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/194/artifact/out/diff-checkstyle-root.txt{color} | {color:orange} root: The patch generated 3 new + 1258 unchanged - 1 fixed = 1261 total (was 1259) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 5m 22s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 48s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 57s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 11m 20s{color} | {color:green}{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || || | {color:green}+1{color} | {color:green} unit {color} | {color:green}
[jira] [Commented] (YARN-9809) NMs should supply a health status when registering with RM
[ https://issues.apache.org/jira/browse/YARN-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201814#comment-17201814 ] Eric Badger commented on YARN-9809: --- I've attached branch-3.2 patch 008 to address your comments, [~Jim_Brennan]. I think I got all of the unit tests to pass. But TestCombinedSystemMetricsPublisher, TestSystemMetricsPublisherForV2, TestFSSchedulerConfigurationStore, and TestZKConfigurationStore failed for me locally on straight up branch-3.2 > NMs should supply a health status when registering with RM > -- > > Key: YARN-9809 > URL: https://issues.apache.org/jira/browse/YARN-9809 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-9809-branch-3.2.007.patch, > YARN-9809-branch-3.2.008.patch, YARN-9809.001.patch, YARN-9809.002.patch, > YARN-9809.003.patch, YARN-9809.004.patch, YARN-9809.005.patch, > YARN-9809.006.patch, YARN-9809.007.patch > > > Currently if the NM registers with the RM and it is unhealthy, it can be > scheduled many containers before the first heartbeat. After the first > heartbeat, the RM will mark the NM as unhealthy and kill all of the > containers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9809) NMs should supply a health status when registering with RM
[ https://issues.apache.org/jira/browse/YARN-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-9809: -- Attachment: YARN-9809-branch-3.2.008.patch > NMs should supply a health status when registering with RM > -- > > Key: YARN-9809 > URL: https://issues.apache.org/jira/browse/YARN-9809 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-9809-branch-3.2.007.patch, > YARN-9809-branch-3.2.008.patch, YARN-9809.001.patch, YARN-9809.002.patch, > YARN-9809.003.patch, YARN-9809.004.patch, YARN-9809.005.patch, > YARN-9809.006.patch, YARN-9809.007.patch > > > Currently if the NM registers with the RM and it is unhealthy, it can be > scheduled many containers before the first heartbeat. After the first > heartbeat, the RM will mark the NM as unhealthy and kill all of the > containers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9809) NMs should supply a health status when registering with RM
[ https://issues.apache.org/jira/browse/YARN-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201782#comment-17201782 ] Eric Badger commented on YARN-9809: --- {noformat} RMNodeImpl#AddNodeTransition#transition RMNodeStatusEvent rmNodeStatusEvent = new RMNodeStatusEvent(nodeId, nodeStatus); NodeHealthStatus nodeHealthStatus = updateRMNodeFromStatusEvents(rmNode, rmNodeStatusEvent); if (nodeHealthStatus.getIsNodeHealthy()) { {noformat} bq. Do we run the risk of nodeHealthStatus being null? [~epayne], nope we should be fine here. {{nodeHealthStatus}} comes from the return value of {{updateRMNodeFromStatusEvents}}. The return value of that method comes from {{statusEvent.getNodeHealthStatus()}}. But {{statusEvent}} is passed into this method via an argument. On the caller side that argument is named {{rmNodeStatusEvent}} and it is craeted a few lines up via the RMNodeStatusEvent constructor. The {{nodeStatus}} is set there via the constructor and we know it won't be null because we are in the "else" of the "if" statement that checked for {{nodeStatus}} being null. > NMs should supply a health status when registering with RM > -- > > Key: YARN-9809 > URL: https://issues.apache.org/jira/browse/YARN-9809 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-9809-branch-3.2.007.patch, YARN-9809.001.patch, > YARN-9809.002.patch, YARN-9809.003.patch, YARN-9809.004.patch, > YARN-9809.005.patch, YARN-9809.006.patch, YARN-9809.007.patch > > > Currently if the NM registers with the RM and it is unhealthy, it can be > scheduled many containers before the first heartbeat. After the first > heartbeat, the RM will mark the NM as unhealthy and kill all of the > containers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10447) TestLeafQueue: ActivitiesManager thread might interfere with ongoing stubbing
[ https://issues.apache.org/jira/browse/YARN-10447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201777#comment-17201777 ] Hadoop QA commented on YARN-10447: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 2m 15s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 40m 31s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 16s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 0s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 23m 36s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 2m 33s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 31s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 7s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 11s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 20m 17s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:
[jira] [Updated] (YARN-10447) TestLeafQueue: ActivitiesManager thread might interfere with ongoing stubbing
[ https://issues.apache.org/jira/browse/YARN-10447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-10447: Attachment: YARN-10447-002.patch > TestLeafQueue: ActivitiesManager thread might interfere with ongoing stubbing > - > > Key: YARN-10447 > URL: https://issues.apache.org/jira/browse/YARN-10447 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-10447-001.patch, YARN-10447-002.patch > > > YARN-9784 fixed some concurrency related issues in {{TestLeafQueue}}, but not > all of them. Occasionally it's still possible to receive an exception from > Mockito and the two following stack traces can be observed in the console: > {noformat} > > org.mockito.exceptions.misusing.WrongTypeOfReturnValue: > Integer cannot be returned by isMultiNodePlacementEnabled() > isMultiNodePlacementEnabled() should return boolean > *** > If you're unsure why you're getting above error read on. > Due to the nature of the syntax above problem might occur because: > 1. This exception *might* occur in wrongly written multi-threaded tests. >Please refer to Mockito FAQ on limitations of concurrency testing. > 2. A spy is stubbed using when(spy.foo()).then() syntax. It is safer to stub > spies - >- with doReturn|Throw() family of methods. More in javadocs for > Mockito.spy() method. > {noformat} > or > {noformat} > 2020-09-22 14:44:52,584 INFO [main] capacity.TestUtils > (TestUtils.java:getMockNode(227)) - node = 127.0.0.3 avail= vCores:1> > 2020-09-22 14:44:52,585 INFO [main] capacity.TestUtils > (TestUtils.java:getMockNode(227)) - node = 127.0.0.4 avail= vCores:1> > Exception in thread "ActivitiesManager thread." java.lang.ClassCastException: > java.lang.Integer cannot be cast to java.lang.Boolean > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$$EnhancerByMockitoWithCGLIB$$272c72c5.isMultiNodePlacementEnabled() > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.activities.ActivitiesManager.dynamicallyUpdateAppActivitiesMaxQueueLengthIfNeeded(ActivitiesManager.java:266) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.activities.ActivitiesManager.access$500(ActivitiesManager.java:63) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.activities.ActivitiesManager$1.run(ActivitiesManager.java:347) > at java.lang.Thread.run(Thread.java:748) > {noformat} > It's probably best to disable ActivitiesManager thread entirely in this test > class, there is no need for it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10438) NPE while fetching container report for a node which is not there in active/decommissioned/lost/unhealthy nodes on RM
[ https://issues.apache.org/jira/browse/YARN-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surendra Singh Lilhore reassigned YARN-10438: - Assignee: Shubham Gupta > NPE while fetching container report for a node which is not there in > active/decommissioned/lost/unhealthy nodes on RM > - > > Key: YARN-10438 > URL: https://issues.apache.org/jira/browse/YARN-10438 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.2.1 >Reporter: Raghvendra Singh >Assignee: Shubham Gupta >Priority: Major > > Here is the Exception trace which we are seeing, we are suspecting because of > this exception RM is reaching in a state where it is no more allowing any new > job to run on the cluster. > {noformat} > 2020-09-15 07:08:15,496 WARN ipc.Server: IPC Server handler 18 on default > port 8032, call Call#1463486 Retry#0 > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getContainerReport > from 10.39.91.205:49564 java.lang.NullPointerException at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:520) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getContainerReport(ApplicationClientProtocolPBServiceImpl.java:466) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:639) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999) at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915) > {noformat} > We are seeing this issue with this specific node only, we do run this cluster > at a scale of around 500 nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10447) TestLeafQueue: ActivitiesManager thread might interfere with ongoing stubbing
[ https://issues.apache.org/jira/browse/YARN-10447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201489#comment-17201489 ] Hadoop QA commented on YARN-10447: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 22s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 33s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 41s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 52s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 50s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 38s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/191/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 504 unchanged - 0 fixed = 505 total (was 504) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 26s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client
[jira] [Commented] (YARN-10448) When use the sls (SYNTH JSON input file format) example the user be null will cause failed.
[ https://issues.apache.org/jira/browse/YARN-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201457#comment-17201457 ] zhuqi commented on YARN-10448: -- No need new test. > When use the sls (SYNTH JSON input file format) example the user be null will > cause failed. > --- > > Key: YARN-10448 > URL: https://issues.apache.org/jira/browse/YARN-10448 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Affects Versions: 3.2.1, 3.4.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: YARN-10448.001.patch > > > java.lang.IllegalArgumentException: Null user > at > org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1269) > at > org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1256) > at > org.apache.hadoop.yarn.sls.appmaster.AMSimulator.submitReservationWhenSpecified(AMSimulator.java:191) > at > org.apache.hadoop.yarn.sls.appmaster.AMSimulator.firstStep(AMSimulator.java:161) > at > org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:88) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10448) When use the sls (SYNTH JSON input file format) example the user be null will cause failed.
[ https://issues.apache.org/jira/browse/YARN-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201443#comment-17201443 ] Hadoop QA commented on YARN-10448: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 48s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 44s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 20s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 0m 46s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 44s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 23s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubu
[jira] [Created] (YARN-10448) When use the sls (SYNTH JSON input file format) example the user be null will cause failed.
zhuqi created YARN-10448: Summary: When use the sls (SYNTH JSON input file format) example the user be null will cause failed. Key: YARN-10448 URL: https://issues.apache.org/jira/browse/YARN-10448 Project: Hadoop YARN Issue Type: Bug Components: scheduler-load-simulator Reporter: zhuqi Assignee: zhuqi java.lang.IllegalArgumentException: Null user at org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1269) at org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1256) at org.apache.hadoop.yarn.sls.appmaster.AMSimulator.submitReservationWhenSpecified(AMSimulator.java:191) at org.apache.hadoop.yarn.sls.appmaster.AMSimulator.firstStep(AMSimulator.java:161) at org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:88) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10447) TestLeafQueue: ActivitiesManager thread might interfere with ongoing stubbing
[ https://issues.apache.org/jira/browse/YARN-10447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-10447: Attachment: YARN-10447-001.patch > TestLeafQueue: ActivitiesManager thread might interfere with ongoing stubbing > - > > Key: YARN-10447 > URL: https://issues.apache.org/jira/browse/YARN-10447 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-10447-001.patch > > > YARN-9784 fixed some concurrency related issues in {{TestLeafQueue}}, but not > all of them. Occasionally it's still possible to receive an exception from > Mockito and the two following stack traces can be observed in the console: > {noformat} > > org.mockito.exceptions.misusing.WrongTypeOfReturnValue: > Integer cannot be returned by isMultiNodePlacementEnabled() > isMultiNodePlacementEnabled() should return boolean > *** > If you're unsure why you're getting above error read on. > Due to the nature of the syntax above problem might occur because: > 1. This exception *might* occur in wrongly written multi-threaded tests. >Please refer to Mockito FAQ on limitations of concurrency testing. > 2. A spy is stubbed using when(spy.foo()).then() syntax. It is safer to stub > spies - >- with doReturn|Throw() family of methods. More in javadocs for > Mockito.spy() method. > {noformat} > or > {noformat} > 2020-09-22 14:44:52,584 INFO [main] capacity.TestUtils > (TestUtils.java:getMockNode(227)) - node = 127.0.0.3 avail= vCores:1> > 2020-09-22 14:44:52,585 INFO [main] capacity.TestUtils > (TestUtils.java:getMockNode(227)) - node = 127.0.0.4 avail= vCores:1> > Exception in thread "ActivitiesManager thread." java.lang.ClassCastException: > java.lang.Integer cannot be cast to java.lang.Boolean > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$$EnhancerByMockitoWithCGLIB$$272c72c5.isMultiNodePlacementEnabled() > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.activities.ActivitiesManager.dynamicallyUpdateAppActivitiesMaxQueueLengthIfNeeded(ActivitiesManager.java:266) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.activities.ActivitiesManager.access$500(ActivitiesManager.java:63) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.activities.ActivitiesManager$1.run(ActivitiesManager.java:347) > at java.lang.Thread.run(Thread.java:748) > {noformat} > It's probably best to disable ActivitiesManager thread entirely in this test > class, there is no need for it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org