Even that is bad. The problem is that the cost of incorrectly stopping regionservers is much high than the cost of not stopping regionservers. Stopping a regionserver scrambles data locality until all of regions are compacted. At the margins, this could decrease performance enough to kill a cluster that is on the edge. Not stopping a regionserver means that you have to say what you mean which isn't a big penalty.
On Fri, Mar 4, 2011 at 5:51 PM, Bill Graham <billgra...@gmail.com> wrote: > What if we just executed the shutdown only if a running master is > found on the host running the stop script and it's the only master in > the cluster? > > > On Thu, Mar 3, 2011 at 4:48 PM, Igor Ranitovic <irani...@gmail.com> wrote: > > What about adding a simple message box using bash's whiptail? > > > > For example: > > > > rs=$(cat ${HBASE_CONF_DIR}/regionservers | xargs) > > > > if ( whiptail --yesno "Do you want to shutdown the cluster with the > > following regionserver $rs\n[y/n]" 10 40 ) > > then > > # proceed with the shutdown > > else > > # exit > > fi > > > > > > On 03/02/2011 05:23 PM, Bill Graham wrote: > >> > >> Hi, > >> > >> We had a troubling experience today that I wanted to share. Our dev > >> cluster got completely shut down by a developer by mistake, without > >> said developer even realizing it. Here's how... > >> > >> We have multiple sets of HBase configs checked into SVN that > >> developers can checkout and point their HBASE_CONF_DIR to to easily > >> change from developing in local mode or testing against our > >> distributed dev cluster. > >> > >> In local mode someone might do something like this: > >> > >> bin/start-hbase.sh > >> bin/hbase shell > >> > >> ... do some work ... > >> > >> bin/stop-hbase.sh > >> > >> The problem arose when a developer accidentally tried to do this with > >> their HBASE_CONF_DIR pointing to our dev cluster configs. When this > >> happens, the first command will add another master to the cluster and > >> the last command will shut down the entire cluster. I assume this > >> happens via Zookeeper somehow, since we don't have ssh keys to > >> remotely start/stop as the user running the processes. > >> > >> So the question is, is this a bug or a feature? If it's a feature it > >> seems like an incredibly dangerous one. Once our live cluster is > >> running, those configs will also be needed on the client so really bad > >> things could happen by mistake. > >> > >> thanks, > >> Bill > >> > > > > >