https://stackoverflow.com/questions/48776589/cassandra-cant-one-use-snapshots-to-rapidly-scale-out-a-cluster/48778179#48778179

So the basic question is, if one records tokens and snapshots from an
existing node, via:

nodetool ring | grep ip_address_of_node | awk '{print $NF ","}' | xargs


for the desired node IP

then takes snapshots

then transfers the snapshots to a new node (not yet attached to cluster)

sets up initial_tokens in the yaml

sets up schema to match

then has it join the cluster

Would that allow quick scaleup of nodes/replication of data? I don't care
if the vnode map changes after the initial join, or data starts being
streamed off as it rebalances, as the cluster

Is there an issue if the vnodes tokens for two nodes are identical? Do they
have to be distinct for each node?
Is it that it mucks with the RF since there will be a greater RF than
normal?
Is this just not that practically faster than an sstable load?

Basically, I was wondering if we just use this to double the number of
nodes with identical copies of the node data via snapshots, and then later
on cassandra can pare down which nodes own which data.

Reply via email to