Hi folks,

In our 3.11 deployments we are using the feature called virtual nodes (vnodes).
So far, we have always used the old default value 256 for the num_tokens 
parameter specified in the cassandra.yaml (see also example file attached), as 
follows:

num_tokens: 256
# allocate_tokens_for_keyspace: KEYSPACE
# initial_token:

Due to problems with the repair in bigger topologies (duration and memory 
consumption), we now want to reduce the value of num_tokens to 32 and together 
with this to specify a keyspace used for one of our applications, e.g. as 
follows:

num_tokens: 32
allocate_tokens_for_keyspace: sb_keyspace

The specified keyspace with parameter allocate_tokens_for_keyspace should feed 
its replication factor into the automatic allocation algorithm for an optimized 
replicated load over the nodes in the datacenter.

At initial startup there seems to be a chicken-and-egg problem, as none of the 
keyspaces is existing in the finally desired setting.
But this question here is not about initial startup, but rather about modifying 
an existing cluster with let’s say 2 datacenters currently running with the old 
default value (num_tokens: 256).

To do this, we would temporarily remove one of the datacenters and re-add it 
with the reduced num_tokens and adapted allocate_tokens_for_keyspace. Followed 
by the same operation on the other datacenter.

Main steps (for this case here now) of how we add a datacenter (same as 
described in publicly available information, e.g. by DataStax):
(1) alter the keyspace definition of all keyspaces (where applicable, mainly 
the keyspaces of our applications) with a RF=0 in the new datacenter
(2) start up all Cassandra nodes of the new datacenter, one by one
(3) alter the keyspace definition of all keyspaces with the wanted RF in the 
new datacenter
(4) perform on each node of the new datacenter: nodetool rebuild 
<existingDataCenter>


But this leads to the following concerns some of our team members have:
According to the recommended procedure how to add a datacenter, we would first 
define a RF of 0 for the keyspaces and then startup the nodes, which means the 
automatic allocation algorithm would in step (2) prepare the data distribution 
based on this (still) wrong RF, wouldn’t it?

Or would the automatic allocation algorithm kick in at a later step? If so, 
when?
Do you see anything wrong in the steps we are doing above?
Do you have any other recommendation, how to perform this wanted change?

Our testing does not show any errors, but it is a bit difficult to tell if 
things are balanced appropriately with a small amount of data. It could be 
costly to do the testing with a large amount of data. We still need to do the 
testing, but want to make sure we understand what we think should happen before 
we go down that route.

My assumption is that when the rebuild takes place in the rebuild step. I took 
a look at 
https://github.com/apache/cassandra/blob/6da9e33602fad4b8bf9466dc0e9a73665469a195/src/java/org/apache/cassandra/tools/nodetool/Rebuild.java
 and I don’t see an obvious place, but then again, I am not a java developer.

Lastly, I understand that this is much improved in 4.x. I also understand that 
3.11 will be EOL shortly. Despite repeated attempts by myself to get an upgrade 
approved this isn’t happening at the moment.

So, I guess there are two questions:
1. Is it correct that the rebuild does this, and if so, what is the piece in 
the code that does it?
2. Does anyone have experience doing this? Are there online instructions you 
used to complete the task? Obviously, we have some from DataStax as mentioned, 
but if there are others we might be able to compare and see where the two sets 
differ. This may give us some clues about our doubts.

Best Regards,

Douglas Whitfield | Enterprise Architect, 
OpenLogic<https://www.openlogic.com/?utm_leadsource=email-signature&utm_source=outlook-direct-email&utm_medium=email&utm_campaign=2019-common&utm_content=email-signature-link>




This e-mail may contain information that is privileged or confidential. If you 
are not the intended recipient, please delete the e-mail and any attachments 
and notify us immediately.

Reply via email to