Re: Recovering from a dead master namenode server

cs user Fri, 04 Mar 2016 01:27:03 -0800

Hi Alejandro,

Many thanks for getting back to me. I'm currently trying to configure ha
for the namenode and yarn resource manager, which should help if we lost a
node at some point. I'm using a blueprint to bootstrap my cluster.


If I use the blueprint without ha enabled, start the cluster, and then
enable ha for both components, after following the setup process everything
works fine. At this point I have exported the cluster blueprint and then
attempted to re-create this cluster with ha configured from the start.

However the install appears to fail. Should this be possible? I noticed
that when I enabled ha, I had to follow a number of manual steps. Is it
possible to have HA configured from the start with a blueprint?

Cheers!



On Thu, Mar 3, 2016 at 7:00 PM, Alejandro Fernandez <
[email protected]> wrote:

> The situation is the same regardless of master/slave/client.
>
> Basically, you have to
> bring up a host with the same FQDN
> install ambari-agent on it
>
> At this point, any components that used to be on that host will report
> heartbeat lost and the cluster may not be fully operational if it contained
> masters (especially NameNode).
> You may then have to restart services on that host, which will actually
> end up installing the bits again and generating configs.
> The hard part is that you may have to run additional commands depending on
> the type of master, think of NameNode or even hosts that contain databases
> for Hive, Oozie, etc.
>
> Attempting to move masters may be complicated because it may require the
> original host to be heartbeating and with the bits installed in order to be
> able to stop the services Ambari knows about.
>
> Thanks,
> Alejandro
>
> From: cs user <[email protected]>
> Reply-To: "[email protected]" <[email protected]>
> Date: Thursday, March 3, 2016 at 5:00 AM
> To: "[email protected]" <[email protected]>
> Subject: Recovering from a dead master namenode server
>
> Hi All,
>
> I'm trying to understand how to recover from certain failures within
> Ambari. When launching within a cloud environment, it's possible that a
> host may be completed deleted, and you won't have the chance to
> decommission the node.
>
> For example, in the event that the server hosting the master hdfs namenode
> was lost, would it be possible to spin up another server in its place,
> built completely from scratch and have this replace the old namenode master?
>
> Currently when I attempt to delete a failed host, it warns me that the
> following components need to be moved:
>
> NameNode, Spark History Server
>
> It also then tries to talk me through the process of copying data from the
> old namenode to the new namenode. If the server has been deleted, this
> would not be possible. Would it be possible to copy this data from the
> secondary namenode instead?
>
> Many thanks in advance.
>
> Cheers!
>
>
>

Re: Recovering from a dead master namenode server

Reply via email to