When a drive fails in a large cluster and you don't immediately have a
replacement drive, is it OK to just remove the drive from cassandra.yaml
and restart the node? Will the missing data (assuming RF=3) be
re-replicated?
I have disk_failure_policy set to "best_effort", but the node still
fails (ie cassandra exits) when a disk (spinning rust) goes bad.
I do have commit_failure_policy set to stop.
Thank you!
-Joe
On 1/7/2022 4:38 PM, Dmitry Saprykin wrote:
There is a jira ticket describing your situation
https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-14793
I may be wrong but is seems that system directories are pinned to
first data directory in cassandra.yaml by default. When you removed
first item from the list system data regenerated in the new first
directory in the list. And then merged??? when original first dir returned
On Fri, Jan 7, 2022 at 4:23 PM Joe Obernberger
<joseph.obernber...@gmail.com> wrote:
Hi - in order to get the node back up and running I did the following:
Deleted all data on the node:
Added: -Dcassandra.replace_address=172.16.100.39
to the cassandra.env.sh <http://cassandra.env.sh> file, and
started it up. It is currently bootstrapping.
In cassandra.yaml, say you have the following:
data_file_directories:
- /data/1/cassandra
- /data/2/cassandra
- /data/3/cassandra
- /data/4/cassandra
- /data/5/cassandra
- /data/6/cassandra
- /data/7/cassandra
- /data/8/cassandra
If I change the above to:
# - /data/1/cassandra
- /data/2/cassandra
- /data/3/cassandra
- /data/4/cassandra
- /data/5/cassandra
- /data/6/cassandra
- /data/7/cassandra
- /data/8/cassandra
the problem happens. If I change it to:
- /data/1/cassandra
- /data/2/cassandra
- /data/3/cassandra
- /data/4/cassandra
- /data/5/cassandra
- /data/6/cassandra
- /data/7/cassandra
# - /data/8/cassandra
the node starts up OK. I assume it will recover the missing data
during a repair?
-Joe
On 1/7/2022 4:13 PM, Mano ksio wrote:
Hi, you may have already tried, but this may help.
https://stackoverflow.com/questions/29323709/unable-to-start-cassandra-node-already-exists
can you be little narrate 'If I remove a drive other than the
first one'? what does it means
On Fri, Jan 7, 2022 at 2:52 PM Joe Obernberger
<joseph.obernber...@gmail.com> wrote:
Hi All - I have a 13 node cluster running Cassandra 4.0.1.
If I stop a
node, edit the cassandra.yaml file, comment out the first
drive in the
list, and restart the node, it fails to start saying that a
node already
exists in the cluster with the IP address.
If I put the drive back into the list, the node still fails
to start
with the same error. At this point the node is useless and I
think the
only option is to remove all the data, and re-boostrap it?
---------
ERROR [main] 2022-01-07 15:50:09,155 CassandraDaemon.java:909 -
Exception encountered during startup
java.lang.RuntimeException: A node with address
/172.16.100.39:7000 <http://172.16.100.39:7000>
already exists, cancelling join. Use
cassandra.replace_address if you
want to replace this node.
at
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659)
at
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)
-----------
If I remove a drive other than the first one, this problem
doesn't
occur. Any other options? It appears that if it the first
drive in the
list goes bad, or is just removed, that entire node must be
replaced.
-Joe
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
Virus-free. www.avg.com
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
<#m_3361535422621871688_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
--
This email has been checked for viruses by AVG.
https://www.avg.com