Re: Blocking while a node finishes joining the cluster after restart.

James Briggs Fri, 19 Sep 2014 20:28:32 -0700

Kevin: "The serial approach would 
take a LONG time for large clusters.

If you have sixty nodes, it could 
take an hour to do a rolling restart."

1) In Cassandra land, an hour is nothing. There's people doing repairs that 
practically
never finish - as soon as one finishes after a week, they have to start the 
next one.

2) I met some people at the conference who were embarrassed to operate only 12 
nodes.
I'm not sure why, since managing 12 is a lot easier and cheaper than 60.
In fact, I would be proud to operate a large site on 8 or 12 nodes. :)

3) After I finish my cass_top project this week, I'll take a look at scripting
what you mentioned in this thread.

Thanks, James Briggs. 
-- 
Cassandra/MySQL DBA. Available in San Jose area or remote. 

________________________________
 From: Kevin Burton <bur...@spinn3r.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>; James Briggs 
<james.bri...@yahoo.com> 
Sent: Friday, September 19, 2014 11:30 AM
Subject: Re: Blocking while a node finishes joining the cluster after restart.

This is great feedback…

I think it could actually be even easier than this…

You could have an ansible (or whatever cluster management system you’re using) 
role for just seeds.

Then you would serially restart all seeds one at a time.  You would need to run 
‘nodetool status’ and make sure the node is ‘U’ (up) I think.. but you might 
want to make sure the majority of other nodes have agreed that this node is up 
and available.

I think you can ONLY do this serially.. .for a LARGE number of hosts, this 
might take a while unless you can compute nodes which have mutually exclusive 
key ranges.

The serial approach would take a LONG time for large clusters.  If you have 
sixty nodes, it could take an hour to do a rolling restart.

Kevin

On Tue, Sep 16, 2014 at 12:21 PM, James Briggs <james.bri...@yahoo.com> wrote:

FYI: OpsCenter has a default of sleep 60 seconds after each node restart,
>and an option of "drain before stopping."
>
>
>
>I haven't noticed if they do anything special with seeds.
>(At least one seed needs to be running before you restart other nodes.)
>
>
>
>I wondered the same thing as Kevin and came to these conclusions.
>
>
>Fixing the startup script is non-trivial as far as startup scripts go.
>
>
>For start, it would have to:
>
>
>- parse cassandra.yaml for seeds
>- if itself is not a seed, wait for a seed to start first. (could take minutes 
>or never.)
>
>- continue start.
>
>
>
>For a no-downtime cluster restart script, it would have to:
>
>
>- verify cluster health (ie. quorum/CL is met or you lose writes)
>
>- parse cassandra.yaml for seeds and see if a seed is up
>- stop gossip and thrift
>- maybe do compaction before drain
>
>- drain node
>- stop/start or restart cassandra process.
>
>http://comments.gmane.org/gmane.comp.db.cassandra.user/20144
>
>Both of those scripts would be nice to have. :)
>
>OpsCenter is flaky at doing rolling restart in my test cluster,
>so an alternative is needed.
>
>Also, the free OpsCenter doesn't have rolling repair option enabled.
>
>ccm has the options to do drain, stop and start, but a bash
>script would be needed to make it rolling.
>
>https://github.com/pcmanus/ccm
>
>
>Thanks, James. 
>-- 
>Cassandra/MySQL DBA. Available in San Jose area or remote.
>
>
>
>
>________________________________
> From: Duncan Sands <duncan.sa...@gmail.com>
>To: user@cassandra.apache.org 
>Sent: Tuesday, September 16, 2014 11:09 AM
>Subject: Re: Blocking while a node finishes joining the cluster after restart.
> 
>
>Hi Kevin, if you are using the latest version of opscenter, then even the 
>community (= free) edition can do a rolling restart of your cluster.  It's 
>pretty convenient.
>
>Ciao, Duncan.
>
>On 16/09/14 19:44, Kevin Burton wrote:
>> Say I want to do a rolling restart of Cassandra…
>>
>> I can’t just restart all of them because they need some time to gossip and 
>> for
>>
 that gossip to get to all nodes.
>>
>> What is the best strategy for this.
>>
>> It would be something like:
>>
>> /etc/init.d/cassandra restart && wait-for-cassandra.sh
>>
>> … or something along those lines.
>>
>> --
>>
>> Founder/CEO Spinn3r.com <http://Spinn3r.com>
>
>> Location: *San Francisco, CA*
>> blog:**http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>> <http://spinn3r.com
>>
>>
>
>
>
>

-- 

Founder/CEO Spinn3r.com

Location: San Francisco, CA

blog: http://burtonator.wordpress.com
… or check out my Google+ profile

Re: Blocking while a node finishes joining the cluster after restart.

Reply via email to