[ https://issues.apache.org/jira/browse/YARN-8014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391438#comment-16391438 ]
Evan Tepsic edited comment on YARN-8014 at 3/8/18 3:59 PM: ----------------------------------------------------------- This could be caused by buildNodeId( ), as the Port # it generates appears to be random when yarn.nodemanager.address is not defined in a NodeManager's yarn-site.xml. was (Author: tepsic): This could be caused by buildNodeId( ), as the Port # it generates appears to be random. > YARN ResourceManager Lists A NodeManager As RUNNING & SHUTDOWN Simultaneously > ----------------------------------------------------------------------------- > > Key: YARN-8014 > URL: https://issues.apache.org/jira/browse/YARN-8014 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.8.2 > Reporter: Evan Tepsic > Priority: Minor > > A graceful shutdown & then startup of a NodeManager process using YARN/HDFS > v2.8.2 seems to successfully place the Node back into RUNNING state. However, > ResouceManager appears to keep the Node also in SHUTDOWN state. > > *Steps To Reproduce:* > 1. SSH to host running NodeManager. > 2. Switch-to UserID that NodeManager is running as (hadoop). > 3. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager > 4. Wait for NodeManager process to terminate gracefully. > 5. Confirm Node is in SHUTDOWN state via: > [http://rb01rm01.local:8088/cluster/nodes] > 6. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager > 7. Confirm Node is in RUNNING state via: > [http://rb01rm01.local:8088/cluster/nodes] > > *Investigation:* > 1. Review contents of ResourceManager + NodeManager log-files: > +ResourceManager log-[file:+|file:///+] > 2018-03-08 08:15:44,085 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Node > with node id : rb0101.local:43892 has shutdown, hence unregistering the node. > 2018-03-08 08:15:44,092 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating > Node rb0101.local:43892 as it is now SHUTDOWN > 2018-03-08 08:15:44,092 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > rb0101.local:43892 Node Transitioned from RUNNING to SHUTDOWN > 2018-03-08 08:15:44,093 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Removed node rb0101.local:43892 cluster capacity: <memory:110592, vCores:54> > 2018-03-08 08:16:08,915 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > NodeManager from node rb0101.local(cmPort: 42627 httpPort: 8042) registered > with capability: <memory:12288, vCores:6>, assigned nodeId rb0101.local:42627 > 2018-03-08 08:16:08,916 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > rb0101.local:42627 Node Transitioned from NEW to RUNNING > 2018-03-08 08:16:08,916 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Added node rb0101.local:42627 cluster capacity: <memory:122880, vCores:60> > 2018-03-08 08:16:34,826 WARN org.apache.hadoop.ipc.Server: Large response > size 2976014 for call Call#428958 Retry#0 > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from > 192.168.1.100:44034 > > +NodeManager log-[file:+|file:///+] > 2018-03-08 08:00:14,500 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Cache Size Before Clean: 10720046250, Total Deleted: 0, Public > Deleted: 0, Private Deleted: 0 > 2018-03-08 08:10:14,498 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Cache Size Before Clean: 10720046250, Total Deleted: 0, Public > Deleted: 0, Private Deleted: 0 > 2018-03-08 08:15:44,048 ERROR > org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: > SIGTERM > 2018-03-08 08:15:44,101 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Successfully > Unregistered the Node rb0101.local:43892 with ResourceManager. > 2018-03-08 08:15:44,114 INFO org.mortbay.log: Stopped > HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:8042 > 2018-03-08 08:15:44,226 INFO org.apache.hadoop.ipc.Server: Stopping server > on 43892 > 2018-03-08 08:15:44,232 INFO org.apache.hadoop.ipc.Server: Stopping IPC > Server listener on 43892 > 2018-03-08 08:15:44,237 INFO org.apache.hadoop.ipc.Server: Stopping IPC > Server Responder > 2018-03-08 08:15:44,239 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: > org.apache.hadoop.yarn.server.nodemanager.containermanager.logag > gregation.LogAggregationService waiting for pending aggregation during exit > 2018-03-08 08:15:44,242 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.Cont > ainersMonitorImpl is interrupted. Exiting. > 2018-03-08 08:15:44,284 INFO org.apache.hadoop.ipc.Server: Stopping server > on 8040 > 2018-03-08 08:15:44,285 INFO org.apache.hadoop.ipc.Server: Stopping IPC > Server listener on 8040 > 2018-03-08 08:15:44,285 INFO org.apache.hadoop.ipc.Server: Stopping IPC > Server Responder > 2018-03-08 08:15:44,287 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Public cache exiting > 2018-03-08 08:15:44,289 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl: > org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl is > interrupted. Exiting. > 2018-03-08 08:15:44,294 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NodeManager > metrics system... > 2018-03-08 08:15:44,295 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system > stopped. > 2018-03-08 08:15:44,296 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system > shutdown complete. > 2018-03-08 08:15:44,297 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG: > /************************************************************ > SHUTDOWN_MSG: Shutting down NodeManager at rb0101.local/192.168.1.101 > ************************************************************/ > 2018-03-08 08:16:01,905 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeManager: STARTUP_MSG: > /************************************************************ > STARTUP_MSG: Starting NodeManager > STARTUP_MSG: user = hadoop > STARTUP_MSG: host = rb0101.local/192.168.1.101 > STARTUP_MSG: args = [] > STARTUP_MSG: version = 2.8.2 > STARTUP_MSG: classpath = blahblahblah (truncated for size-purposes) > STARTUP_MSG: build = Unknown -r Unknown; compiled by 'root' on > 2017-09-14T18:22Z > STARTUP_MSG: java = 1.8.0_144 > ************************************************************/ > 2018-03-08 08:16:01,918 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeManager: registered UNIX signal > handlers for [TERM, HUP, INT] > 2018-03-08 08:16:03,202 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeManager: Node Manager health > check script is not available or doesn't have execute permission, so not > starting the > node health script runner. > 2018-03-08 08:16:03,321 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Registering class > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerEventType > for class > > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher > 2018-03-08 08:16:03,322 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Registering class > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationEventType > for c > lass > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher > 2018-03-08 08:16:03,323 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Registering class > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.LocalizationEventType > for class > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService > 2018-03-08 08:16:03,323 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Registering class > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServicesEventType > for class org.apa > che.hadoop.yarn.server.nodemanager.containermanager.AuxServices > 2018-03-08 08:16:03,324 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Registering class > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorEventType > for > class > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl > 2018-03-08 08:16:03,324 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Registering class > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEventType > f > or class > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher > 2018-03-08 08:16:03,347 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Registering class > org.apache.hadoop.yarn.server.nodemanager.ContainerManagerEventType for class > org.apache.hadoop.y > arn.server.nodemanager.containermanager.ContainerManagerImpl > 2018-03-08 08:16:03,348 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Registering class > org.apache.hadoop.yarn.server.nodemanager.NodeManagerEventType for class > org.apache.hadoop.yarn.s > erver.nodemanager.NodeManager > 2018-03-08 08:16:03,402 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: > loaded properties from hadoop-metrics2.properties > 2018-03-08 08:16:03,484 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot > period at 10 second(s). > 2018-03-08 08:16:03,484 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system > started > 2018-03-08 08:16:03,561 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl: Using > ResourceCalculatorPlugin : > org.apache.hadoop.yarn.util.ResourceCalculatorPlugin@4b8729f > f > 2018-03-08 08:16:03,564 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Registering class > org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.event.LogHandlerEventType > f > or class > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService > 2018-03-08 08:16:03,565 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Registering class > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploa > dEventType for class > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploadService > 2018-03-08 08:16:03,565 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > AMRMProxyService is disabled > 2018-03-08 08:16:03,566 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > per directory file limit = 8192 > 2018-03-08 08:16:03,621 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > usercache path : > [file:/space/hadoop/tmp/nm-local-dir/usercache_|file:///space/hadoop/tmp/nm-local-dir/usercache_] > DEL_1520518563569 > 2018-03-08 08:16:03,667 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting > path : > [file:/space/hadoop/tmp/nm-local-dir/usercache_DEL_1520518563569/user1|file:///space/hadoop/tmp/nm-local-dir/usercache_DEL_1520518563569/user1] > 2018-03-08 08:16:03,667 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting > path : > [file:/space/hadoop/tmp/nm-local-dir/usercache_DEL_1520518563569/user2|file:///space/hadoop/tmp/nm-local-dir/usercache_DEL_1520518563569/user2] > 2018-03-08 08:16:03,668 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting > path : > [file:/space/hadoop/tmp/nm-local-dir/usercache_DEL_1520518563569/user3|file:///space/hadoop/tmp/nm-local-dir/usercache_DEL_1520518563569/user3] > 2018-03-08 08:16:03,681 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting > path : > [file:/space/hadoop/tmp/nm-local-dir/usercache_DEL_1520518563569/user4|file:///space/hadoop/tmp/nm-local-dir/usercache_DEL_1520518563569/user4] > 2018-03-08 08:16:03,739 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Registering class > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.LocalizerEventType > for class > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker > 2018-03-08 08:16:03,793 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: > Adding auxiliary service mapreduce_shuffle, "mapreduce_shuffle" > 2018-03-08 08:16:03,826 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Using ResourceCalculatorPlugin : > org.apache.hadoop.yarn.util.ResourceCalculatorPlugin@1187c9e8 > 2018-03-08 08:16:03,826 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Using ResourceCalculatorProcessTree : null > 2018-03-08 08:16:03,827 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Physical memory check enabled: true > 2018-03-08 08:16:03,827 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Virtual memory check enabled: true > 2018-03-08 08:16:03,832 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > ContainersMonitor enabled: true > 2018-03-08 08:16:03,841 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Nodemanager > resources: memory set to 12288MB. > 2018-03-08 08:16:03,841 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Nodemanager > resources: vcores set to 6. > 2018-03-08 08:16:03,846 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized > nodemanager with : physical-memory=12288 virtual-memory=25805 virtual-cores=6 > 2018-03-08 08:16:03,850 INFO org.apache.hadoop.util.JvmPauseMonitor: > Starting JVM pause monitor > 2018-03-08 08:16:03,908 INFO org.apache.hadoop.ipc.CallQueueManager: Using > callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 2000 > scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler > 2018-03-08 08:16:03,932 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 42627 > 2018-03-08 08:16:04,153 INFO > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding > protocol org.apache.hadoop.yarn.api.ContainerManagementProtocolPB to the > server > 2018-03-08 08:16:04,153 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Blocking new container-requests as container manager rpc server is still > starting. > 2018-03-08 08:16:04,154 INFO org.apache.hadoop.ipc.Server: IPC Server > Responder: starting > 2018-03-08 08:16:04,154 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 42627: starting > 2018-03-08 08:16:04,166 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: > Updating node address : rb0101.local:42627 > 2018-03-08 08:16:04,183 INFO org.apache.hadoop.ipc.CallQueueManager: Using > callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 500 > scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler > 2018-03-08 08:16:04,184 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8040 > 2018-03-08 08:16:04,191 INFO > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding > protocol org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB > to the server > 2018-03-08 08:16:04,191 INFO org.apache.hadoop.ipc.Server: IPC Server > Responder: starting > 2018-03-08 08:16:04,191 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 8040: starting > 2018-03-08 08:16:04,192 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Localizer started on port 8040 > 2018-03-08 08:16:04,312 INFO org.apache.hadoop.mapred.IndexCache: IndexCache > created with max memory = 10485760 > 2018-03-08 08:16:04,330 INFO org.apache.hadoop.mapred.ShuffleHandler: > mapreduce_shuffle listening on port 13562 > 2018-03-08 08:16:04,337 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > ContainerManager started at rb0101.local/192.168.1.101:42627 > 2018-03-08 08:16:04,337 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > ContainerManager bound to 0.0.0.0/0.0.0.0:0 > 2018-03-08 08:16:04,340 INFO > org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating > NMWebApp at 0.0.0.0:8042 > 2018-03-08 08:16:04,427 INFO org.mortbay.log: Logging to > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via > org.mortbay.log.Slf4jLog > 2018-03-08 08:16:04,436 INFO > org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable > to initialize FileSignerSecretProvider, falling back to use random secrets. > 2018-03-08 08:16:04,442 INFO org.apache.hadoop.http.HttpRequestLog: Http > request log for http.requests.nodemanager is not defined > 2018-03-08 08:16:04,450 INFO org.apache.hadoop.http.HttpServer2: Added > global filter 'safety' > (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter) > 2018-03-08 08:16:04,461 INFO org.apache.hadoop.http.HttpServer2: Added > filter static_user_filter > (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to > context node > 2018-03-08 08:16:04,462 INFO org.apache.hadoop.http.HttpServer2: Added > filter static_user_filter > (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to > context logs > 2018-03-08 08:16:04,462 INFO org.apache.hadoop.http.HttpServer2: Added > filter static_user_filter > (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to > context static > 2018-03-08 08:16:04,462 INFO > org.apache.hadoop.security.HttpCrossOriginFilterInitializer: CORS filter not > enabled. Please set hadoop.http.cross-origin.enabled to 'true' to enable it > 2018-03-08 08:16:04,465 INFO org.apache.hadoop.http.HttpServer2: adding path > spec: /node/* > 2018-03-08 08:16:04,465 INFO org.apache.hadoop.http.HttpServer2: adding path > spec: /ws/* > 2018-03-08 08:16:04,843 INFO org.apache.hadoop.yarn.webapp.WebApps: > Registered webapp guice modules > 2018-03-08 08:16:04,846 INFO org.apache.hadoop.http.HttpServer2: Jetty bound > to port 8042 > 2018-03-08 08:16:04,846 INFO org.mortbay.log: jetty-6.1.26 > 2018-03-08 08:16:04,877 INFO org.mortbay.log: Extract > jar:[file:/opt/hadoop-2.8.2/share/hadoop/yarn/hadoop-yarn-common-2.8.2.jar!/webapps/node|file:///opt/hadoop-2.8.2/share/hadoop/yarn/hadoop-yarn-common-2.8.2.jar!/webapps/node] > to /tmp/Jetty_0_0_0_0_8042_node____19tj0x/webapp > 2018-03-08 08:16:08,355 INFO org.mortbay.log: Started > HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:8042 > 2018-03-08 08:16:08,356 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app > node started at 8042 > 2018-03-08 08:16:08,473 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node ID > assigned is : rb0101.local:42627 > 2018-03-08 08:16:08,498 INFO org.apache.hadoop.yarn.client.RMProxy: > Connecting to ResourceManager at rb01rm01.local/192.168.1.100:8031 > 2018-03-08 08:16:08,613 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out > 0 NM container statuses: [] > 2018-03-08 08:16:08,621 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registering > with RM using containers :[] > 2018-03-08 08:16:08,934 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: > Rolling master-key for container-tokens, got key with id -2086472604 > 2018-03-08 08:16:08,938 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: > Rolling master-key for container-tokens, got key with id -426187560 > 2018-03-08 08:16:08,939 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered > with ResourceManager as rb0101.local:42627 with total resource of > <memory:12288, vCores:6> > 2018-03-08 08:16:08,939 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying > ContainerManager to unblock new container-requests > 2018-03-08 08:26:04,174 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Cache Size Before Clean: 0, Total Deleted: 0, Public Deleted: 0, Private > Deleted: 0 > 2018-03-08 08:36:04,170 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Cache Size Before Clean: 0, Total Deleted: 0, Public Deleted: 0, Private > Deleted: 0 > 2018-03-08 08:46:04,170 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Cache Size Before Clean: 0, Total Deleted: 0, Public Deleted: 0, Private > Deleted: 0 > 2. Listing all of YARN's Nodes, we can see it was returned to the RUNNING > state. However, when listing all nodes, it shows the node in 2 states; > RUNNING and SHUTDOWN: > [hadoop@rb01rm01 logs]$ /opt/hadoop/bin/yarn node -list -all > 18/03/08 09:20:33 INFO client.RMProxy: Connecting to ResourceManager at > rb01rm01.local/192.168.1.100:8032 > 18/03/08 09:20:34 INFO client.AHSProxy: Connecting to Application History > server at rb01rm01.local/192.168.1.100:10200 > Total Nodes:11 > Node-Id Node-State Node-Http-Address Number-of-Running-Containers > rb0106.local:44160 RUNNING rb0106.local:8042 0 > rb0105.local:32832 RUNNING rb0105.local:8042 0 > rb0101.local:42627 RUNNING rb0101.local:8042 0 > rb0108.local:38209 RUNNING rb0108.local:8042 0 > rb0107.local:34306 RUNNING rb0107.local:8042 0 > rb0102.local:43063 RUNNING rb0102.local:8042 0 > rb0103.local:42374 RUNNING rb0103.local:8042 0 > rb0109.local:37455 RUNNING rb0109.local:8042 0 > rb0110.local:36690 RUNNING rb0110.local:8042 0 > rb0104.local:33268 RUNNING rb0104.local:8042 0 > rb0101.local:43892 SHUTDOWN rb0101.local:8042 0 > [hadoop@rb01rm01 logs]$ > [hadoop@rb01rm01 logs]$ /opt/hadoop/bin/yarn node -list -states RUNNING > 18/03/08 09:20:55 INFO client.RMProxy: Connecting to ResourceManager at > rb01rm01.local/192.168.1.100:8032 > 18/03/08 09:20:56 INFO client.AHSProxy: Connecting to Application History > server at rb01rm01.local/192.168.1.100:10200 > Total Nodes:10 > Node-Id Node-State Node-Http-Address Number-of-Running-Containers > rb0106.local:44160 RUNNING rb0106.local:8042 0 > rb0105.local:32832 RUNNING rb0105.local:8042 0 > rb0101.local:42627 RUNNING rb0101.local:8042 0 > rb0108.local:38209 RUNNING rb0108.local:8042 0 > rb0107.local:34306 RUNNING rb0107.local:8042 0 > rb0102.local:43063 RUNNING rb0102.local:8042 0 > rb0103.local:42374 RUNNING rb0103.local:8042 0 > rb0109.local:37455 RUNNING rb0109.local:8042 0 > rb0110.local:36690 RUNNING rb0110.local:8042 0 > rb0104.local:33268 RUNNING rb0104.local:8042 0 > [hadoop@rb01rm01 logs]$ /opt/hadoop/bin/yarn node -list -states SHUTDOWN > 18/03/08 09:21:01 INFO client.RMProxy: Connecting to ResourceManager at > rb01rm01.local/192.168.1.100:8032 > 18/03/08 09:21:01 INFO client.AHSProxy: Connecting to Application History > server at rb01rm01.local/192.168.1.100:10200 > Total Nodes:0 > Node-Id Node-State Node-Http-Address Number-of-Running-Containers > [hadoop@rb01rm01 logs]$ > 3. ResourceManager however, does not list Node rb0101.local as SHUTDOWN when > specifically requesting list of Nodes in SHUTDOWN state: > [hadoop@rb01rm01 bin]$ /opt/hadoop/bin/yarn node -list -states SHUTDOWN > 18/03/08 08:28:23 INFO client.RMProxy: Connecting to ResourceManager at > rb01rm01.local/v.x.y.z:8032 > 18/03/08 08:28:24 INFO client.AHSProxy: Connecting to Application History > server at rb01rm01.local/v.x.y.z:10200 > Total Nodes:0 > Node-Id Node-State Node-Http-Address Number-of-Running-Containers > [hadoop@rb01rm01 bin]$ -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org