Re: Should replica placement change after a topology change?

Richard Dawe Thu, 10 Sep 2015 08:57:14 -0700

Hi Robert,

Firstly, thank you very much for you help. I have some comments inline below.


On 10/09/2015 01:26, "Robert Coli" 
<rc...@eventbrite.com<mailto:rc...@eventbrite.com>> wrote:

On Wed, Sep 9, 2015 at 7:52 AM, Richard Dawe 
<rich.d...@messagesystems.com<mailto:rich.d...@messagesystems.com>> wrote:
I am investigating various topology changes, and their effect on replica 
placement. As far as I can tell, replica placement is not changing after I’ve 
changed the topology and run nodetool repair + cleanup. I followed the 
procedure described at 
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_switch_snitch.html

That's probably a good thing. I'm going to be modifying the warning in the 
cassandra.yaml to advise users that in practice the only change of snitch or 
replication strategy one can safely do is one in which replica placement does 
not change. It currently says that you need to repair, but there are plenty of 
scenarios where you lose all existing replicas for a given datum, and are 
therefore unable to repair. The key is that you need at least one replica to 
stay the same or repair is worthless. And if you only have one replica staying 
the same, you lose any consistency consistency contract you might have been 
operating under. One ALMOST NEVER ACTUALLY WANTS TO DO ANYTHING BUT A NO-OP 
HERE.

So if you have a topology that would change if you switched from SimpleStrategy 
to NetworkTopologyStrategy plus multiple racks, it sounds like a different 
migration strategy would be needed?

I am imagining:

  1.  Switch to a different snitch, and the keyspace from SimpleStrategy to NTS 
but keep it all in one rack. So effectively the same topology, but with a 
different snitch.
  2.  Set up a new data centre with the desired topology.
  3.  Change the keyspace to have replicas in the new DC.
  4.  Rebuild all the nodes in the new DC.
  5.  Flip all your clients over to the new DC.
  6.  Decommission your original DC.

Or something like that.


Here is my test scenario : <snip>


  1.  To determine the token range ownership, I used “nodetool ring <keyspace>” 
and “nodetool info –T <keyspace>”. I saved the output of those commands with 
the original topology, after changing the topology, after repairing, after 
changing the replication strategy, and then again after repairing. In no cases 
did the tokens change. It looks like nodetool ring and nodetool info –T show 
the owner but not the replicas for a particular range.

The tokens and ranges shouldn't be changing, the replica placement should be. 
AFAIK neither of those commands show you replica placement, they show you 
primary range ownership.

Use getendpoints to determine replica placement before and after.


Thanks, I will play with that when I have a chance next week.


I was expecting the replica placement to change. Because the racks were 
assigned in groups (rather than alternating), I was expecting the original 
replica placement with SimpleStrategy to be non-optimal after switching to 
NetworkTopologyStrategy. E.g.: if some data was replicated to nodes 1, 2 and 3, 
then after the topology change there would be 2 replicas in RAC1, 1 in RAC2 and 
none in RAC3. And hence when the repair ran, it would remove one replica from 
RAC1 and make sure that there was a replica in RAC3.

I would expect this to be the case.

However, when I did a query using cqlsh at consistency QUORUM, I saw that it 
was hitting two replicas in the same rack, and a replica in a different rack. 
This suggests that the replica placement did not change after the topology 
change.

Perhaps you are seeing the quirks of the current rack-aware implementation, 
explicated here?

https://issues.apache.org/jira/browse/CASSANDRA-3810


Thanks. I need to re-read that a few times to understand it.

Is there some way I can see which nodes have a replica for a given token range?

Not for a range, but for a given key with nodetool getendpoints.

I wonder if there would be value to the range... in the pre-vnode past I have 
merely generated a key for each range. With the number of ranges increased so 
dramatically by vnodes, it might be easier to have an endpoint that works on 
ranges...

Thank you again. Best regards, Rich


=Rob

Re: Should replica placement change after a topology change?

Reply via email to