Hi Alejandro, Many thanks for getting back to me. I'm currently trying to configure ha for the namenode and yarn resource manager, which should help if we lost a node at some point. I'm using a blueprint to bootstrap my cluster.
If I use the blueprint without ha enabled, start the cluster, and then enable ha for both components, after following the setup process everything works fine. At this point I have exported the cluster blueprint and then attempted to re-create this cluster with ha configured from the start. However the install appears to fail. Should this be possible? I noticed that when I enabled ha, I had to follow a number of manual steps. Is it possible to have HA configured from the start with a blueprint? Cheers! On Thu, Mar 3, 2016 at 7:00 PM, Alejandro Fernandez < [email protected]> wrote: > The situation is the same regardless of master/slave/client. > > Basically, you have to > bring up a host with the same FQDN > install ambari-agent on it > > At this point, any components that used to be on that host will report > heartbeat lost and the cluster may not be fully operational if it contained > masters (especially NameNode). > You may then have to restart services on that host, which will actually > end up installing the bits again and generating configs. > The hard part is that you may have to run additional commands depending on > the type of master, think of NameNode or even hosts that contain databases > for Hive, Oozie, etc. > > Attempting to move masters may be complicated because it may require the > original host to be heartbeating and with the bits installed in order to be > able to stop the services Ambari knows about. > > Thanks, > Alejandro > > From: cs user <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Thursday, March 3, 2016 at 5:00 AM > To: "[email protected]" <[email protected]> > Subject: Recovering from a dead master namenode server > > Hi All, > > I'm trying to understand how to recover from certain failures within > Ambari. When launching within a cloud environment, it's possible that a > host may be completed deleted, and you won't have the chance to > decommission the node. > > For example, in the event that the server hosting the master hdfs namenode > was lost, would it be possible to spin up another server in its place, > built completely from scratch and have this replace the old namenode master? > > Currently when I attempt to delete a failed host, it warns me that the > following components need to be moved: > > NameNode, Spark History Server > > It also then tries to talk me through the process of copying data from the > old namenode to the new namenode. If the server has been deleted, this > would not be possible. Would it be possible to copy this data from the > secondary namenode instead? > > Many thanks in advance. > > Cheers! > > >
