Is there some unwritten wisdom with regards to the use 'nodetool compact'
before bootstrapping new nodes and decommissioning old ones?

TL;DR:
I've been spending the last few days trying to move a cluster on
DigitalOcean 2GB machines to 4GB machines (same provider). To do so I
wanted to create the new nodes, bootstrap them, then decommission the old
ones (one by one seems to be the only available option).

The bootstrapping was failing, eventually I figured out it was somehow
related to the TombstoneOverwhelmingException on the new nodes. I issued a
'nodetool compact' on the entire cluster to try to minimize the number of
Tombstones. Once that was done I was able to bootstrap all my new nodes.

Now is the time to decommission. From the very first node I tried to
decommission I've been getting 1 node dying after an almost endless loop of
"GC for ConcurrentMarkSweep" showing the heap getting fuller and fuller
until the node dies. On one node I've been able to bump the MAX_HEAP_SIZE
by 400MB and get it to work (it was a 4GB node), but now I'm getting the
same symptoms on a 2GB node where the heap is as big as it can be before
the OS running out of RAM itself, so I can't expand the MAX_HEAP_SIZE. It
would seem I have really painted myself into a scrap-the-cluster kind of
way.

Not knowing the inner-workings of Cassandra's bootstrap and decommission
mechanisms means all I can do is make an educated guesses that perhaps
doing another 'nodetool compact' on the nodes I'm about to decommission
might help. However I have not found any wisdom or documentation on
anything relating to this, which I find surprising as I can't be the first
to have had this problem.

BOTTOM LINE:
Does anyone have a real-world production process for efficiently and
reliably bootstrap and decommission nodes in a cluster? Seems it might look
like <compact all>, <bootstrap one-by-one>, <compact all>, <decommission
one-by-one (really?!?)>. Or are all my problems due to me running on
"hardware" that doesn't have resources (RAM,CPU) to spare in the first
place?

Thanks

Reply via email to