Re: Restarting a resource manager kills the other in HA

daemeon reiydelle Tue, 24 Feb 2015 14:16:28 -0800

Only one rm will be active at a time. The other is in standby. When you
started the new rm, the configuration files direct the "new" rm to come up
and take over, the old primary will go to stand by (or should!). Working as
designed except you will see slowdown in scheduling. I suspect what you
want is for the new rm to come up in standby, not take over, no?


So ... I see normal messages for a switch over. However you should still
see the standby rm receiving status from the new active rm if ha is
configured.

sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198
London +44.0.20.8144.9872
On Feb 24, 2015 1:56 PM, "Nikhil" <mnik...@gmail.com> wrote:

> Hi,
>
> In the YARN HA for Resource Manager, I noticed that the HA has been fine
> initially during the HA setup but however after sometime I notice that
> restarting one resource manager gets the other resource manager
> stopped/killed. Below is what I see the logs on the killed resource manager
> instance. I am using hadoop version 2.5.1, if that helps.
>
> Has anyone seen this before? Any ideas on how do I go about this one?
>
> thanks,
> Nikhil
>
> -----
>
> 2015-02-24 16:47:37,555 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Yielding from election
> 2015-02-24 16:47:37,555 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Deleting bread-crumb of active node...
> 2015-02-24 16:47:37,555 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
> 2015-02-24 16:47:37,580 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x14b997543fd001e closed
> 2015-02-24 16:47:37,580 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Ignoring stale result from old client with sessionId 0x14b997543fd001e
> 2015-02-24 16:47:37,580 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2015-02-24 16:47:37,580 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
> Transitioning to standby state
> 2015-02-24 16:47:37,581 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager
> metrics system...
> 2015-02-24 16:47:37,587 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics
> system stopped.
> 2015-02-24 16:47:37,588 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics
> system shutdown complete.
> 2015-02-24 16:47:37,588 INFO
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore:
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$VerifyActiveStatusThread
> thread interrupted! Exiting!
> 2015-02-24 16:47:37,616 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x24b13ab5b4c069a closed
> 2015-02-24 16:47:37,616 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2015-02-24 16:47:37,616 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> AsyncDispatcher is draining to stop, igonring any new events.
> 2015-02-24 16:47:37,617 WARN
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher:
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
> interrupted. Returning.
> 2015-02-24 16:47:37,618 INFO org.apache.hadoop.ipc.Server: Stopping server
> on 8032
> 2015-02-24 16:47:37,622 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server listener on 8032
> 2015-02-24 16:47:37,622 INFO org.apache.hadoop.ipc.Server: Stopping server
> on 8030
> 2015-02-24 16:47:37,623 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
> 2015-02-24 16:47:37,627 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server listener on 8030
> 2015-02-24 16:47:37,627 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
> 2015-02-24 16:47:37,629 INFO org.apache.hadoop.ipc.Server: Stopping server
> on 8031
> 2015-02-24 16:47:37,633 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server listener on 8031
> 2015-02-24 16:47:37,633 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
> 2015-02-24 16:47:37,634 INFO
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: NMLivelinessMonitor
> thread interrupted
>
> -----
>

Re: Restarting a resource manager kills the other in HA

Reply via email to