Re: Blocking while a node finishes joining the cluster after restart.

Jonathan Haddad Fri, 19 Sep 2014 13:06:21 -0700

Depending on how you query (one or quorum) you might be able to do 1 rack at a 
time (or az or whatever you've got) assuming your snitch is set up right



> On Sep 19, 2014, at 11:30 AM, Kevin Burton <bur...@spinn3r.com> wrote:
> 
> This is great feedback…
> 
> I think it could actually be even easier than this…
> 
> You could have an ansible (or whatever cluster management system you’re 
> using) role for just seeds.
> 
> Then you would serially restart all seeds one at a time.  You would need to 
> run ‘nodetool status’ and make sure the node is ‘U’ (up) I think.. but you 
> might want to make sure the majority of other nodes have agreed that this 
> node is up and available.
> 
> I think you can ONLY do this serially.. .for a LARGE number of hosts, this 
> might take a while unless you can compute nodes which have mutually exclusive 
> key ranges.
> 
> The serial approach would take a LONG time for large clusters.  If you have 
> sixty nodes, it could take an hour to do a rolling restart.
> 
> Kevin
> 
>> On Tue, Sep 16, 2014 at 12:21 PM, James Briggs <james.bri...@yahoo.com> 
>> wrote:
>> FYI: OpsCenter has a default of sleep 60 seconds after each node restart,
>> and an option of "drain before stopping."
>> 
>> I haven't noticed if they do anything special with seeds.
>> (At least one seed needs to be running before you restart other nodes.)
>> 
>> I wondered the same thing as Kevin and came to these conclusions.
>> 
>> Fixing the startup script is non-trivial as far as startup scripts go.
>> 
>> For start, it would have to:
>> 
>> - parse cassandra.yaml for seeds
>> - if itself is not a seed, wait for a seed to start first. (could take 
>> minutes or never.)
>> - continue start.
>> 
>> For a no-downtime cluster restart script, it would have to:
>> 
>> - verify cluster health (ie. quorum/CL is met or you lose writes)
>> - parse cassandra.yaml for seeds and see if a seed is up
>> - stop gossip and thrift
>> - maybe do compaction before drain
>> - drain node
>> - stop/start or restart cassandra process.
>> 
>> http://comments.gmane.org/gmane.comp.db.cassandra.user/20144
>> 
>> Both of those scripts would be nice to have. :)
>> 
>> OpsCenter is flaky at doing rolling restart in my test cluster,
>> so an alternative is needed.
>> 
>> Also, the free OpsCenter doesn't have rolling repair option enabled.
>> 
>> ccm has the options to do drain, stop and start, but a bash
>> script would be needed to make it rolling.
>> 
>> https://github.com/pcmanus/ccm
>> 
>> Thanks, James. 
>> -- 
>> Cassandra/MySQL DBA. Available in San Jose area or remote.
>> 
>> From: Duncan Sands <duncan.sa...@gmail.com>
>> To: user@cassandra.apache.org 
>> Sent: Tuesday, September 16, 2014 11:09 AM
>> Subject: Re: Blocking while a node finishes joining the cluster after 
>> restart.
>> 
>> Hi Kevin, if you are using the latest version of opscenter, then even the 
>> community (= free) edition can do a rolling restart of your cluster.  It's 
>> pretty convenient.
>> 
>> Ciao, Duncan.
>> 
>> On 16/09/14 19:44, Kevin Burton wrote:
>> > Say I want to do a rolling restart of Cassandra…
>> >
>> > I can’t just restart all of them because they need some time to gossip and 
>> > for
>> > that gossip to get to all nodes.
>> >
>> > What is the best strategy for this.
>> >
>> > It would be something like:
>> >
>> > /etc/init.d/cassandra restart && wait-for-cassandra.sh
>> >
>> > … or something along those lines.
>> >
>> > --
>> >
>> > Founder/CEO Spinn3r.com <http://Spinn3r.com>
>> 
>> > Location: *San Francisco, CA*
>> > blog:**http://burtonator.wordpress.com
>> > … or check out my Google+ profile
>> 
>> > <https://plus.google.com/102718274791889610666/posts>
>> > <http://spinn3r.com
>> >
>> >
> 
> 
> 
> -- 
> Founder/CEO Spinn3r.com
> Location: San Francisco, CA
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
>

Re: Blocking while a node finishes joining the cluster after restart.

Reply via email to