Hi everyone! I just read the rolling-restart.sh in $HBASE_HOME/bin, found
that the script would stop all master service (including the backup ones)
at the same time, and then restart them both:

# The content of rolling-restart.sh
...
# stop all masters before re-start to avoid races for master znode
"$bin"/hbase-daemon.sh --config "${HBASE_CONF_DIR}" stop master
"$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" \
  --hosts "${HBASE_BACKUP_MASTERS}" stop master-backup

# make sure the master znode has been deleted before continuing
zmaster=`$bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool
zookeeper.znode.master`
...

# all masters are down, now restart
"$bin"/hbase-daemon.sh --config "${HBASE_CONF_DIR}"
${START_CMD_DIST_MODE} master
"$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" \
  --hosts "${HBASE_BACKUP_MASTERS}" ${START_CMD_DIST_MODE} master-backup

In this way the HMaster service would be unavailable during this period.
Why is it designed in this way? Can it be done in a more graceful way? Like
this:

   - Stop the backup master, and then restart it
   - Stop the active master, then the backup master would become active
   - start the original active one of master, now it's the backup one

I have tested it on my own cluster and it seems to work fine. Is this more
graceful? Or am I missing something?

Reply via email to