If you are migrating all nodes, you might want to consider creating a new
data center, bringing up all the new nodes (bootstrap) in that new data
center, and then decommissioning all the nodes in the old data center.
That way, the existing nodes remain fully operational during the process,
and the new nodes are not available until the new data center is completely
ready. And if something goes wrong, no harm to the existing nodes.
-- Jack Krupansky
-----Original Message-----
From: Robert Stupp
Sent: Sunday, August 17, 2014 11:17 AM
To: user@cassandra.apache.org
Subject: Re: Compaction before Decommission and Bootstrapping
In a few words:
Bootstrap one node at once
Wait for bootstrap to complete
Next node
More details: datastax.com/docs (C* 2.0)
Before decommissioning: nodetool cleanup
Don't forget to do repairs (one node at a time) - this should be a regular
admin task
--
Sent from my iPhone
Am 17.08.2014 um 15:46 schrieb Maxime <maxim...@gmail.com>:
Is there some unwritten wisdom with regards to the use 'nodetool compact'
before bootstrapping new nodes and decommissioning old ones?
TL;DR:
I've been spending the last few days trying to move a cluster on
DigitalOcean 2GB machines to 4GB machines (same provider). To do so I
wanted to create the new nodes, bootstrap them, then decommission the old
ones (one by one seems to be the only available option).
The bootstrapping was failing, eventually I figured out it was somehow
related to the TombstoneOverwhelmingException on the new nodes. I issued a
'nodetool compact' on the entire cluster to try to minimize the number of
Tombstones. Once that was done I was able to bootstrap all my new nodes.
Now is the time to decommission. From the very first node I tried to
decommission I've been getting 1 node dying after an almost endless loop
of "GC for ConcurrentMarkSweep" showing the heap getting fuller and fuller
until the node dies. On one node I've been able to bump the MAX_HEAP_SIZE
by 400MB and get it to work (it was a 4GB node), but now I'm getting the
same symptoms on a 2GB node where the heap is as big as it can be before
the OS running out of RAM itself, so I can't expand the MAX_HEAP_SIZE. It
would seem I have really painted myself into a scrap-the-cluster kind of
way.
Not knowing the inner-workings of Cassandra's bootstrap and decommission
mechanisms means all I can do is make an educated guesses that perhaps
doing another 'nodetool compact' on the nodes I'm about to decommission
might help. However I have not found any wisdom or documentation on
anything relating to this, which I find surprising as I can't be the first
to have had this problem.
BOTTOM LINE:
Does anyone have a real-world production process for efficiently and
reliably bootstrap and decommission nodes in a cluster? Seems it might
look like <compact all>, <bootstrap one-by-one>, <compact all>,
<decommission one-by-one (really?!?)>. Or are all my problems due to me
running on "hardware" that doesn't have resources (RAM,CPU) to spare in
the first place?
Thanks