Thanks for your replaying! I'm working on this recently, let me create an issue and update it.
Bryan Beaudreault <[email protected]> 于2023年3月10日周五 10:41写道: > Taking a look at the git blame for the script, some of the parts you > reference are over 13 years old. So it may just be that they deserve some > updating. Anyway, you are not missing anything and your approach is both > safe and more graceful. > > On Thu, Mar 9, 2023 at 8:47 PM Bryan Beaudreault <[email protected]> > wrote: > > > I can’t speak to why the script is the way it is. But I will say that my > > company has been running hbase at massive scale with high reliability > > standards for years. We’ve never used any of the built in shell scripts. > We > > have our own automation, and our HMaster rolling restart is more like > what > > you describe. So I would say the shell script here is overly conservative > > and not prioritizing availability. There’s no concern for racing for > master > > node, since it uses ZK for leader election, which is designed for this > > case. I’d recommend you do what you describe instead if you value > > availability (who doesn’t :)?) > > > > On Thu, Mar 9, 2023 at 2:46 AM 杨光 <[email protected]> wrote: > > > >> Hi everyone! I just read the rolling-restart.sh in $HBASE_HOME/bin, > found > >> that the script would stop all master service (including the backup > ones) > >> at the same time, and then restart them both: > >> > >> # The content of rolling-restart.sh > >> ... > >> # stop all masters before re-start to avoid races for master znode > >> "$bin"/hbase-daemon.sh --config "${HBASE_CONF_DIR}" stop master > >> "$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" \ > >> --hosts "${HBASE_BACKUP_MASTERS}" stop master-backup > >> > >> # make sure the master znode has been deleted before continuing > >> zmaster=`$bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool > >> zookeeper.znode.master` > >> ... > >> > >> # all masters are down, now restart > >> "$bin"/hbase-daemon.sh --config "${HBASE_CONF_DIR}" > >> ${START_CMD_DIST_MODE} master > >> "$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" \ > >> --hosts "${HBASE_BACKUP_MASTERS}" ${START_CMD_DIST_MODE} master-backup > >> > >> In this way the HMaster service would be unavailable during this period. > >> Why is it designed in this way? Can it be done in a more graceful way? > >> Like > >> this: > >> > >> - Stop the backup master, and then restart it > >> - Stop the active master, then the backup master would become active > >> - start the original active one of master, now it's the backup one > >> > >> I have tested it on my own cluster and it seems to work fine. Is this > more > >> graceful? Or am I missing something? > >> > > >
