Thanks for your replaying! I'm working on this recently, let me create an
issue and update it.

Bryan Beaudreault <[email protected]> 于2023年3月10日周五 10:41写道:

> Taking a look at the git blame for the script, some of the parts you
> reference are over 13 years old. So it may just be that they deserve some
> updating. Anyway, you are not missing anything and your approach is both
> safe and more graceful.
>
> On Thu, Mar 9, 2023 at 8:47 PM Bryan Beaudreault <[email protected]>
> wrote:
>
> > I can’t speak to why the script is the way it is. But I will say that my
> > company has been running hbase at massive scale with high reliability
> > standards for years. We’ve never used any of the built in shell scripts.
> We
> > have our own automation, and our HMaster rolling restart is more like
> what
> > you describe. So I would say the shell script here is overly conservative
> > and not prioritizing availability. There’s no concern for racing for
> master
> > node, since it uses ZK for leader election, which is designed for this
> > case. I’d recommend you do what you describe instead if you value
> > availability (who doesn’t :)?)
> >
> > On Thu, Mar 9, 2023 at 2:46 AM 杨光 <[email protected]> wrote:
> >
> >> Hi everyone! I just read the rolling-restart.sh in $HBASE_HOME/bin,
> found
> >> that the script would stop all master service (including the backup
> ones)
> >> at the same time, and then restart them both:
> >>
> >> # The content of rolling-restart.sh
> >> ...
> >> # stop all masters before re-start to avoid races for master znode
> >> "$bin"/hbase-daemon.sh --config "${HBASE_CONF_DIR}" stop master
> >> "$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" \
> >>   --hosts "${HBASE_BACKUP_MASTERS}" stop master-backup
> >>
> >> # make sure the master znode has been deleted before continuing
> >> zmaster=`$bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool
> >> zookeeper.znode.master`
> >> ...
> >>
> >> # all masters are down, now restart
> >> "$bin"/hbase-daemon.sh --config "${HBASE_CONF_DIR}"
> >> ${START_CMD_DIST_MODE} master
> >> "$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" \
> >>   --hosts "${HBASE_BACKUP_MASTERS}" ${START_CMD_DIST_MODE} master-backup
> >>
> >> In this way the HMaster service would be unavailable during this period.
> >> Why is it designed in this way? Can it be done in a more graceful way?
> >> Like
> >> this:
> >>
> >>    - Stop the backup master, and then restart it
> >>    - Stop the active master, then the backup master would become active
> >>    - start the original active one of master, now it's the backup one
> >>
> >> I have tested it on my own cluster and it seems to work fine. Is this
> more
> >> graceful? Or am I missing something?
> >>
> >
>

Reply via email to