Depending on how you query (one or quorum) you might be able to do 1 rack at a time (or az or whatever you've got) assuming your snitch is set up right
> On Sep 19, 2014, at 11:30 AM, Kevin Burton <bur...@spinn3r.com> wrote: > > This is great feedback… > > I think it could actually be even easier than this… > > You could have an ansible (or whatever cluster management system you’re > using) role for just seeds. > > Then you would serially restart all seeds one at a time. You would need to > run ‘nodetool status’ and make sure the node is ‘U’ (up) I think.. but you > might want to make sure the majority of other nodes have agreed that this > node is up and available. > > I think you can ONLY do this serially.. .for a LARGE number of hosts, this > might take a while unless you can compute nodes which have mutually exclusive > key ranges. > > The serial approach would take a LONG time for large clusters. If you have > sixty nodes, it could take an hour to do a rolling restart. > > Kevin > >> On Tue, Sep 16, 2014 at 12:21 PM, James Briggs <james.bri...@yahoo.com> >> wrote: >> FYI: OpsCenter has a default of sleep 60 seconds after each node restart, >> and an option of "drain before stopping." >> >> I haven't noticed if they do anything special with seeds. >> (At least one seed needs to be running before you restart other nodes.) >> >> I wondered the same thing as Kevin and came to these conclusions. >> >> Fixing the startup script is non-trivial as far as startup scripts go. >> >> For start, it would have to: >> >> - parse cassandra.yaml for seeds >> - if itself is not a seed, wait for a seed to start first. (could take >> minutes or never.) >> - continue start. >> >> For a no-downtime cluster restart script, it would have to: >> >> - verify cluster health (ie. quorum/CL is met or you lose writes) >> - parse cassandra.yaml for seeds and see if a seed is up >> - stop gossip and thrift >> - maybe do compaction before drain >> - drain node >> - stop/start or restart cassandra process. >> >> http://comments.gmane.org/gmane.comp.db.cassandra.user/20144 >> >> Both of those scripts would be nice to have. :) >> >> OpsCenter is flaky at doing rolling restart in my test cluster, >> so an alternative is needed. >> >> Also, the free OpsCenter doesn't have rolling repair option enabled. >> >> ccm has the options to do drain, stop and start, but a bash >> script would be needed to make it rolling. >> >> https://github.com/pcmanus/ccm >> >> Thanks, James. >> -- >> Cassandra/MySQL DBA. Available in San Jose area or remote. >> >> From: Duncan Sands <duncan.sa...@gmail.com> >> To: user@cassandra.apache.org >> Sent: Tuesday, September 16, 2014 11:09 AM >> Subject: Re: Blocking while a node finishes joining the cluster after >> restart. >> >> Hi Kevin, if you are using the latest version of opscenter, then even the >> community (= free) edition can do a rolling restart of your cluster. It's >> pretty convenient. >> >> Ciao, Duncan. >> >> On 16/09/14 19:44, Kevin Burton wrote: >> > Say I want to do a rolling restart of Cassandra… >> > >> > I can’t just restart all of them because they need some time to gossip and >> > for >> > that gossip to get to all nodes. >> > >> > What is the best strategy for this. >> > >> > It would be something like: >> > >> > /etc/init.d/cassandra restart && wait-for-cassandra.sh >> > >> > … or something along those lines. >> > >> > -- >> > >> > Founder/CEO Spinn3r.com <http://Spinn3r.com> >> >> > Location: *San Francisco, CA* >> > blog:**http://burtonator.wordpress.com >> > … or check out my Google+ profile >> >> > <https://plus.google.com/102718274791889610666/posts> >> > <http://spinn3r.com >> > >> > > > > > -- > Founder/CEO Spinn3r.com > Location: San Francisco, CA > blog: http://burtonator.wordpress.com > … or check out my Google+ profile >