[jira] [Comment Edited] (YARN-11501) ResourceManager deadlock due to StatusUpdateWhenHealthyTransition.hasScheduledAMContainers
[ https://issues.apache.org/jira/browse/YARN-11501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17737005#comment-17737005 ] Prabhu Joseph edited comment on YARN-11501 at 6/26/23 6:19 AM: --- >> I am not able to trace ClusterNodeTracker#updateMaxResources -> >> RMNodeImpl.getState .. in trunk code . Any private change ?? Thanks, Bibin Chundatt. Yes, you are right; this part is a private change. During initial analysis, we were trying to fix the locking at {_}StatusUpdateWhenHealthyTransition{_}.{_}hasScheduledAMContainers{_} (locks _RMNode_ first and then {_}SchedulerNode{_}). But we found the fix at our private change ({_}ClusterNodeTracker{_}.{_}updateMaxResources{_} -> {_}RMNodeImpl{_}.{_}getState{_}, which locks _SchedulerNode_ first and then {_}RMNode{_}) easier. This deadlock issue won't happen without the private change, so I will mark this invalid. was (Author: prabhu joseph): >> I am not able to trace ClusterNodeTracker#updateMaxResources -> >> RMNodeImpl.getState .. in trunk code . Any private change ?? Thanks, Bibin Chundatt. Yes, you are right; this part is a private change. During initial analysis, we were trying to fix the locking at {_}StatusUpdateWhenHealthyTransition{_}.{_}hasScheduledAMContainers{_} (locks _RMNode_ first and then {_}SchedulerNode{_}). But we found the fix at our private change ({_}ClusterNodeTracker{_}.{_}updateMaxResources{_} -> {_}RMNodeImpl{_}.{_}getState{_}, which locks _SchedulerNode_ first and then {_}RMNode{_}) easier. This deadlock issue won't happen without the private change, so I will mark this invalid. > ResourceManager deadlock due to > StatusUpdateWhenHealthyTransition.hasScheduledAMContainers > -- > > Key: YARN-11501 > URL: https://issues.apache.org/jira/browse/YARN-11501 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.4.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Critical > > We have seen a deadlock in ResourceManager due to > StatusUpdateWhenHealthyTransition.hasScheduledAMContainers holding the lock > on RMNode and waiting to lock SchedulerNode whereas > CapacityScheduler#removeNode taken lock on SchedulerNode and waiting to lock > RMNode. > cc *Vishal Vyas* > > {code:java} > Found one Java-level deadlock: > = > "qtp1401737458-850": > waiting for ownable synchronizer 0x000717e6ff60, (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), > which is held by "RM Event dispatcher" > "RM Event dispatcher": > waiting for ownable synchronizer 0x0007168a7a38, (a > java.util.concurrent.locks.ReentrantReadWriteLock$FairSync), > which is held by "SchedulerEventDispatcher:Event Processor" > "SchedulerEventDispatcher:Event Processor": > waiting for ownable synchronizer 0x000717e6ff60, (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), > which is held by "RM Event dispatcher" > Java stack information for the threads listed above: > === > "qtp1401737458-850": > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000717e6ff60> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) > at > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.getState(RMNodeImpl.java:619) > at > org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.queryRMNodes(RMServerUtils.java:128) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:464) > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) > at > com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) > at > com.sun.jersey.server.impl.model.method.dispatch
[jira] [Commented] (YARN-11501) ResourceManager deadlock due to StatusUpdateWhenHealthyTransition.hasScheduledAMContainers
[ https://issues.apache.org/jira/browse/YARN-11501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17737005#comment-17737005 ] Prabhu Joseph commented on YARN-11501: -- >> I am not able to trace ClusterNodeTracker#updateMaxResources -> >> RMNodeImpl.getState .. in trunk code . Any private change ?? Thanks, Bibin Chundatt. Yes, you are right; this part is a private change. During initial analysis, we were trying to fix the locking at {_}StatusUpdateWhenHealthyTransition{_}.{_}hasScheduledAMContainers{_} (locks _RMNode_ first and then {_}SchedulerNode{_}). But we found the fix at our private change ({_}ClusterNodeTracker{_}.{_}updateMaxResources{_} -> {_}RMNodeImpl{_}.{_}getState{_}, which locks _SchedulerNode_ first and then {_}RMNode{_}) easier. This deadlock issue won't happen without the private change, so I will mark this invalid. > ResourceManager deadlock due to > StatusUpdateWhenHealthyTransition.hasScheduledAMContainers > -- > > Key: YARN-11501 > URL: https://issues.apache.org/jira/browse/YARN-11501 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.4.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Critical > > We have seen a deadlock in ResourceManager due to > StatusUpdateWhenHealthyTransition.hasScheduledAMContainers holding the lock > on RMNode and waiting to lock SchedulerNode whereas > CapacityScheduler#removeNode taken lock on SchedulerNode and waiting to lock > RMNode. > cc *Vishal Vyas* > > {code:java} > Found one Java-level deadlock: > = > "qtp1401737458-850": > waiting for ownable synchronizer 0x000717e6ff60, (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), > which is held by "RM Event dispatcher" > "RM Event dispatcher": > waiting for ownable synchronizer 0x0007168a7a38, (a > java.util.concurrent.locks.ReentrantReadWriteLock$FairSync), > which is held by "SchedulerEventDispatcher:Event Processor" > "SchedulerEventDispatcher:Event Processor": > waiting for ownable synchronizer 0x000717e6ff60, (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), > which is held by "RM Event dispatcher" > Java stack information for the threads listed above: > === > "qtp1401737458-850": > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000717e6ff60> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) > at > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.getState(RMNodeImpl.java:619) > at > org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.queryRMNodes(RMServerUtils.java:128) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:464) > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) > at > com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) > at > com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) > at > com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) > at > com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) > at > com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) > at > com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) > at > com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) > at > com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) > at > com.sun.jersey.server.impl.applica
[jira] [Resolved] (YARN-11501) ResourceManager deadlock due to StatusUpdateWhenHealthyTransition.hasScheduledAMContainers
[ https://issues.apache.org/jira/browse/YARN-11501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph resolved YARN-11501. -- Resolution: Invalid > ResourceManager deadlock due to > StatusUpdateWhenHealthyTransition.hasScheduledAMContainers > -- > > Key: YARN-11501 > URL: https://issues.apache.org/jira/browse/YARN-11501 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.4.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Critical > > We have seen a deadlock in ResourceManager due to > StatusUpdateWhenHealthyTransition.hasScheduledAMContainers holding the lock > on RMNode and waiting to lock SchedulerNode whereas > CapacityScheduler#removeNode taken lock on SchedulerNode and waiting to lock > RMNode. > cc *Vishal Vyas* > > {code:java} > Found one Java-level deadlock: > = > "qtp1401737458-850": > waiting for ownable synchronizer 0x000717e6ff60, (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), > which is held by "RM Event dispatcher" > "RM Event dispatcher": > waiting for ownable synchronizer 0x0007168a7a38, (a > java.util.concurrent.locks.ReentrantReadWriteLock$FairSync), > which is held by "SchedulerEventDispatcher:Event Processor" > "SchedulerEventDispatcher:Event Processor": > waiting for ownable synchronizer 0x000717e6ff60, (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), > which is held by "RM Event dispatcher" > Java stack information for the threads listed above: > === > "qtp1401737458-850": > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000717e6ff60> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) > at > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.getState(RMNodeImpl.java:619) > at > org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.queryRMNodes(RMServerUtils.java:128) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:464) > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) > at > com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) > at > com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) > at > com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) > at > com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) > at > com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) > at > com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) > at > com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) > at > com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) > at > com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) > at > com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) > at > com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409) > at > com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) > at > com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:927) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875) > at > org.apache.hadoop.y
[jira] [Comment Edited] (YARN-11501) ResourceManager deadlock due to StatusUpdateWhenHealthyTransition.hasScheduledAMContainers
[ https://issues.apache.org/jira/browse/YARN-11501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17736986#comment-17736986 ] Bibin Chundatt edited comment on YARN-11501 at 6/26/23 5:10 AM: [~prabhujoseph] Did a quick scan at the call stack.. at org.apache.hadoop.yarn.server.resourcemanager.rmnode.*RMNodeImpl.getState(RMNodeImpl.java:619)* at org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.updateMaxResources(ClusterNodeTracker.java:307) I am not able to trace ClusterNodeTracker#updateMaxResources -> RMNodeImpl.getState .. in trunk code . Any private change ?? was (Author: bibinchundatt): [~prabhujoseph] Did a quick scan at the call stack.. Dont stack tracematching with one from OSS at org.apache.hadoop.yarn.server.resourcemanager.rmnode.*RMNodeImpl.getState(RMNodeImpl.java:619)* at org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.updateMaxResources(ClusterNodeTracker.java:307) I am not able to trace ClusterNodeTracker#updateMaxResources -> RMNodeImpl.getState .. in trunk code . Any private change ?? > ResourceManager deadlock due to > StatusUpdateWhenHealthyTransition.hasScheduledAMContainers > -- > > Key: YARN-11501 > URL: https://issues.apache.org/jira/browse/YARN-11501 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.4.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Critical > > We have seen a deadlock in ResourceManager due to > StatusUpdateWhenHealthyTransition.hasScheduledAMContainers holding the lock > on RMNode and waiting to lock SchedulerNode whereas > CapacityScheduler#removeNode taken lock on SchedulerNode and waiting to lock > RMNode. > cc *Vishal Vyas* > > {code:java} > Found one Java-level deadlock: > = > "qtp1401737458-850": > waiting for ownable synchronizer 0x000717e6ff60, (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), > which is held by "RM Event dispatcher" > "RM Event dispatcher": > waiting for ownable synchronizer 0x0007168a7a38, (a > java.util.concurrent.locks.ReentrantReadWriteLock$FairSync), > which is held by "SchedulerEventDispatcher:Event Processor" > "SchedulerEventDispatcher:Event Processor": > waiting for ownable synchronizer 0x000717e6ff60, (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), > which is held by "RM Event dispatcher" > Java stack information for the threads listed above: > === > "qtp1401737458-850": > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000717e6ff60> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) > at > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.getState(RMNodeImpl.java:619) > at > org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.queryRMNodes(RMServerUtils.java:128) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:464) > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) > at > com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) > at > com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) > at > com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) > at > com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) > at > com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) > at > com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) > at >
[jira] [Commented] (YARN-11501) ResourceManager deadlock due to StatusUpdateWhenHealthyTransition.hasScheduledAMContainers
[ https://issues.apache.org/jira/browse/YARN-11501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17736986#comment-17736986 ] Bibin Chundatt commented on YARN-11501: --- [~prabhujoseph] Did a quick scan at the call stack.. Dont stack tracematching with one from OSS at org.apache.hadoop.yarn.server.resourcemanager.rmnode.*RMNodeImpl.getState(RMNodeImpl.java:619)* at org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.updateMaxResources(ClusterNodeTracker.java:307) I am not able to trace ClusterNodeTracker#updateMaxResources -> RMNodeImpl.getState .. in trunk code . Any private change ?? > ResourceManager deadlock due to > StatusUpdateWhenHealthyTransition.hasScheduledAMContainers > -- > > Key: YARN-11501 > URL: https://issues.apache.org/jira/browse/YARN-11501 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.4.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Critical > > We have seen a deadlock in ResourceManager due to > StatusUpdateWhenHealthyTransition.hasScheduledAMContainers holding the lock > on RMNode and waiting to lock SchedulerNode whereas > CapacityScheduler#removeNode taken lock on SchedulerNode and waiting to lock > RMNode. > cc *Vishal Vyas* > > {code:java} > Found one Java-level deadlock: > = > "qtp1401737458-850": > waiting for ownable synchronizer 0x000717e6ff60, (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), > which is held by "RM Event dispatcher" > "RM Event dispatcher": > waiting for ownable synchronizer 0x0007168a7a38, (a > java.util.concurrent.locks.ReentrantReadWriteLock$FairSync), > which is held by "SchedulerEventDispatcher:Event Processor" > "SchedulerEventDispatcher:Event Processor": > waiting for ownable synchronizer 0x000717e6ff60, (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), > which is held by "RM Event dispatcher" > Java stack information for the threads listed above: > === > "qtp1401737458-850": > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000717e6ff60> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) > at > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.getState(RMNodeImpl.java:619) > at > org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.queryRMNodes(RMServerUtils.java:128) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:464) > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) > at > com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) > at > com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) > at > com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) > at > com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) > at > com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) > at > com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) > at > com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) > at > com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) > at > com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) > at > com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) > at > com.sun.jersey.server.impl.application.
[jira] [Commented] (YARN-11517) Improve Federation#RouterCLI deregisterSubCluster Code
[ https://issues.apache.org/jira/browse/YARN-11517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17736892#comment-17736892 ] ASF GitHub Bot commented on YARN-11517: --- hadoop-yetus commented on PR #5766: URL: https://github.com/apache/hadoop/pull/5766#issuecomment-1606151656 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 3s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 17m 47s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 23m 53s | | trunk passed | | +1 :green_heart: | compile | 8m 35s | | trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | compile | 8m 15s | | trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | checkstyle | 2m 14s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 41s | | trunk passed | | +1 :green_heart: | javadoc | 1m 43s | | trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 1m 30s | | trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | spotbugs | 2m 33s | | trunk passed | | +1 :green_heart: | shadedclient | 26m 5s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 31s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 0m 51s | | the patch passed | | +1 :green_heart: | compile | 8m 6s | | the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javac | 8m 6s | | the patch passed | | +1 :green_heart: | compile | 7m 29s | | the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | javac | 7m 29s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 52s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 36s | | the patch passed | | +1 :green_heart: | javadoc | 1m 31s | | the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 1m 27s | | the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | spotbugs | 2m 31s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 34s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 28m 50s | | hadoop-yarn-client in the patch passed. | | +1 :green_heart: | unit | 0m 56s | | hadoop-yarn-server-router in the patch passed. | | +1 :green_heart: | asflicense | 1m 13s | | The patch does not generate ASF License warnings. | | | | 181m 23s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5766/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5766 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux fbb7510bd25b 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 6423c5342da06ccbe9f21d0cb158834d3fed06d8 | | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5766/4/testReport/ | | Max. process+thread count | 712 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoo
[jira] [Commented] (YARN-11517) Improve Federation#RouterCLI deregisterSubCluster Code
[ https://issues.apache.org/jira/browse/YARN-11517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17736868#comment-17736868 ] ASF GitHub Bot commented on YARN-11517: --- slfan1989 commented on code in PR #5766: URL: https://github.com/apache/hadoop/pull/5766#discussion_r1241152687 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/rmadmin/FederationRMAdminInterceptor.java: ## @@ -879,9 +879,11 @@ private DeregisterSubClusters deregisterSubCluster(String reqSubClusterId) { SubClusterState subClusterState = subClusterInfo.getState(); long lastHeartBeat = subClusterInfo.getLastHeartBeat(); Date lastHeartBeatDate = new Date(lastHeartBeat); - + String heartBeatTimeOut = Review Comment: I will modify the code. > Improve Federation#RouterCLI deregisterSubCluster Code > -- > > Key: YARN-11517 > URL: https://issues.apache.org/jira/browse/YARN-11517 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, router >Reporter: Shilun Fan >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9610) HeartbeatCallBack int FederationInterceptor clear AMRMToken in response from UAM should before add to aysncResponseSink
[ https://issues.apache.org/jira/browse/YARN-9610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17736840#comment-17736840 ] Morty Zhong commented on YARN-9610: --- [~walhl] Yes, i compared these two patch. This issue can be closed > HeartbeatCallBack int FederationInterceptor clear AMRMToken in response from > UAM should before add to aysncResponseSink > > > Key: YARN-9610 > URL: https://issues.apache.org/jira/browse/YARN-9610 > Project: Hadoop YARN > Issue Type: Bug > Components: amrmproxy, federation >Affects Versions: 3.2.0 >Reporter: Morty Zhong >Assignee: Morty Zhong >Priority: Major > Attachments: YARN-9610.patch.1, YARN-9610.patch.2 > > > in federation, `allocate` is async. the response from RM is cached in > `asyncResponseSink`. > the final allocate response is merged from all RMs allocate response. merge > will throw exception when AMRMToken from UAM response is not null. > But set AMRMToken from UAM response to null is not in the scope of lock. so > there will be a change merge see that AMRMToken from UAM response is not > null. > so we should clear the token before add response to asyncResponseSink > > > {code:java} > synchronized (asyncResponseSink) { > List responses = null; > if (asyncResponseSink.containsKey(subClusterId)) { > responses = asyncResponseSink.get(subClusterId); > } else { > responses = new ArrayList<>(); > asyncResponseSink.put(subClusterId, responses); > } > responses.add(response); > // Notify main thread about the response arrival > asyncResponseSink.notifyAll(); > } > ... > if (this.isUAM && response.getAMRMToken() != null) { > Token newToken = ConverterUtils > .convertFromYarn(response.getAMRMToken(), (Text) null); > // Do not further propagate the new amrmToken for UAM > response.setAMRMToken(null); > ...{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org