Hi Jing, Thanks for the response. I will try this out, and file an Apache jira.
Best, Colin Williams On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <j...@hortonworks.com> wrote: > Hi Colin, > > I guess currently we may have to restart almost all the > daemons/services in order to swap out a standby NameNode (SBN): > > 1. The current active NameNode (ANN) needs to know the new SBN since in > the current implementation the SBN tries to send rollEditLog RPC request to > ANN periodically (thus if a NN failover happens later, the original ANN > needs to send this RPC to the correct NN). > 2. Looks like the DataNode currently cannot do real refreshment for NN. > Look at the code in BPOfferService: > > void refreshNNList(ArrayList<InetSocketAddress> addrs) throws > IOException { > Set<InetSocketAddress> oldAddrs = Sets.newHashSet(); > for (BPServiceActor actor : bpServices) { > oldAddrs.add(actor.getNNSocketAddress()); > } > Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs); > > if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) { > // Keep things simple for now -- we can implement this at a later > date. > throw new IOException( > "HA does not currently support adding a new standby to a running > DN. " + > "Please do a rolling restart of DNs to reconfigure the list of > NNs."); > } > } > > 3. If you're using automatic failover, you also need to update the > configuration of the ZKFC on the current ANN machine, since ZKFC will do > gracefully fencing by sending RPC to the other NN. > 4. Looks like we do not need to restart JournalNodes for the new SBN but I > have not tried before. > > Thus in general we may still have to restart all the services (except > JNs) and update their configurations. But this may be a rolling restart > process I guess: > 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN. > 2. Keep the ANN and its corresponding ZKFC running, do a rolling restart > of all the DN to update their configurations > 3. After restarting all the DN, stop ANN and the ZKFC, and update their > configuration. The new SBN should become active. > > I have not tried the upper steps, thus please let me know if this > works or not. And I think we should also document the correct steps in > Apache. Could you please file an Apache jira? > > Thanks, > -Jing > > > > On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <disc...@uw.edu> > wrote: > >> Hello, >> >> I'm trying to swap out a standby NameNode in a QJM / HA configuration. I >> believe the steps to achieve this would be something similar to: >> >> Use the Bootstrap standby command to prep the replacment standby. Or >> rsync if the command fails. >> >> Somehow update the datanodes, so they push the heartbeat / journal to the >> new standby >> >> Update the xml configuration on all nodes to reflect the replacment >> standby. >> >> Start the replacment standby >> >> Use some hadoop command to refresh the datanodes to the new NameNode >> configuration. >> >> I am not sure how to deal with the Journal switch, or if I am going about >> this the right way. Can anybody give me some suggestions here? >> >> >> Regards, >> >> Colin Williams >> >> > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity > to which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You.