Re: One of my nodes is in the wrong datacenter - help!

Sholes, Joshua Mon, 10 Feb 2014 07:12:51 -0800

In case anyone was following this issue, it ended up being something that 
looked an awful lot like CASSANDRA-6053 — when the node was removed, it didn’t 
successfully remove from the peers table from all nodes, and thus several of 
them were doing their best to try to contact it despite it being down.
--
Josh Sholes

From: <Sholes>, Josh Sholes 
<joshua_sho...@cable.comcast.com<mailto:joshua_sho...@cable.comcast.com>>
Date: Thursday, February 6, 2014 at 1:41 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: One of my nodes is in the wrong datacenter - help!

Thanks for the advice.   I did use “removenode” as I was aware of the 
replace_token problems.
I haven’t run into the issue in CASSANDRA-6615 yet, and I don’t believe I’m at 
risk for it.

I’m actually running into a different problem.   Having done a remove node on 
the node with the incorrect datacenter name, I am still getting “one or more 
nodes were unavailable” messages when doing queries with consistency=all.   I’m 
doing a full repair pass on the column family in question just to be safe 
(which is taking forever!) before I do anything else.   So to reiterate:  my 
cluster now shows 7 nodes up when looking with gossipinfo or status, but will 
still not do consistency=all queries.   Are there any best practices for 
finding out other issues with the cluster, or should I anticipate the repair 
pass will fix the problem?
--
Josh Sholes

From: Robert Coli <rc...@eventbrite.com<mailto:rc...@eventbrite.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, February 3, 2014 at 7:30 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: One of my nodes is in the wrong datacenter - help!

On Sun, Feb 2, 2014 at 10:48 AM, Sholes, Joshua 
<joshua_sho...@cable.comcast.com<mailto:joshua_sho...@cable.comcast.com>> wrote:
I had a node in my 8-node production 1.2.8 cluster have a serious problem and 
need to be removed and rebuilt.   However, after doing nodetool removenode and 
then bootstrapping a new node on the same IP address, the new node somehow 
ended up with a different datacenter name (the rest of the nodes are in dc 
$NAME, and the new one is in dc $NAME6934724 — as in, a string of seemingly 
random numbers appended to the correct name).   How can I force it to change DC 
names back to what it should be?

You could change the entry in the system.local columnfamily on the affected 
node...

cqlsh > update system.local set data_center = "$NAME";

... but that is Not Supported and may have side effects of which I am not aware.

I’m working with 500+GB per node here so bootstrapping it again is not a huge 
issue, but I’d prefer to avoid it anyway.  I am NOT able to change the node’s 
IP address at this time so I’m stuck with bootstrapping a new node in the same 
place, which my gut feeling tells me might be part of the problem.

Note that replace_node/replace_token are broken in 1.2.8, did you attempt to 
use either of these? I presume not because you said you did removenode...

 If I were you, I would probably removenode and re-bootstrap, as the safest 
alternative.

As an aside, while trying to deal with this issue you should be aware of this 
ticket, so you do not do the sequence of actions it describes.

https://issues.apache.org/jira/browse/CASSANDRA-6615

=Rob

Re: One of my nodes is in the wrong datacenter - help!

Reply via email to