here the previous error:

2021-02-28 05:17:33,262 WARN NodeConnectionsService.java:165
validateAndConnectIfNeeded failed to connect to node
{y.y.y.y}{9ba2d3ee-bc82-4e76-ae24-9e20eb334c24}{9ba2d3ee-bc82-4e76-ae24-9e20eb334c24}{
y.y.y.y }{ y.y.y.y :9300}{ALIVE}{rack=r1, dc=DC1} (tried [1] times)
org.elasticsearch.transport.ConnectTransportException: [ y.y.y.y ][ y.y.y.y
:9300] connect_timeout[30s]
at
org.elasticsearch.transport.TcpChannel.awaitConnected(TcpChannel.java:163)
at
org.elasticsearch.transport.TcpTransport.openConnection(TcpTransport.java:616)
at
org.elasticsearch.transport.TcpTransport.connectToNode(TcpTransport.java:513)
at
org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:336)
at
org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:323)
at
org.elasticsearch.cluster.NodeConnectionsService.validateAndConnectIfNeeded(NodeConnectionsService.java:156)
at
org.elasticsearch.cluster.NodeConnectionsService$ConnectionChecker.doRun(NodeConnectionsService.java:185)
at
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672)
at
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Yes this node (y.y.y.y) stopped because it went out of disk space.


I said "deleted" because I'm not a native english speaker :)
I usually "remove" snapshots via 'nodetool clearsnapshot' or
cassandra-reaper user interface.




Il giorno lun 1 mar 2021 alle ore 12:39 Bowen Song <bo...@bso.ng.invalid>
ha scritto:

> What was the warning? Is it related to the disk failure policy? Could you
> please share the relevant log? You can edit it and redact the sensitive
> information before sharing it.
>
> Also, I can't help to notice that you used the word "delete" (instead of
> "clear") to describe the process of removing snapshots. May I ask how did
> you delete the snapshots? Was it "nodetool clearsnapshot ...", "rm -rf ..."
> or something else?
>
>
> On 01/03/2021 11:27, Marco Gasparini wrote:
>
> thanks Bowen for answering
>
> Actually, I checked the server log and the only warning was that a node
> went offline.
> No, I have no backups or snapshots.
>
> In the meantime I found that probably Cassandra moved all files from a
> directory to the snapshot directory. I am pretty sure of that because I
> have recently deleted all the snapshots I made because it was going out of
> disk space and I found this very directory full of files where the
> modification timestamp was the same as the first error I got in the log.
>
>
>
> Il giorno lun 1 mar 2021 alle ore 12:13 Bowen Song <bo...@bso.ng.invalid>
> <bo...@bso.ng.invalid> ha scritto:
>
>> The first thing I'd check is the server log. The log may contain vital
>> information about the cause of it, and that there may be different ways to
>> recover from it depending on the cause.
>>
>> Also, please allow me to ask a seemingly obvious question, do you have a
>> backup?
>>
>>
>> On 01/03/2021 09:34, Marco Gasparini wrote:
>>
>> hello everybody,
>>
>> This morning, Monday!!!, I was checking on Cassandra cluster and I
>> noticed that all data was missing. I noticed the following error on each
>> node (9 nodes in the cluster):
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *2021-03-01 09:05:52,984 WARN  [MessagingService-Incoming-/x.x.x.x]
>> IncomingTcpConnection.java:103 run UnknownColumnFamilyException reading
>> from socket; closing org.apache.cassandra.db.UnknownColumnFamilyException:
>> Couldn't find table for cfId cba90a70-5c46-11e9-9e36-f54fe3235e69. If a
>> table was just created, this is likely due to the schema not being fully
>> propagated.  Please wait for schema agreement on table creation.         at
>> org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1533)
>>         at
>> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:758)
>>         at
>> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:697)
>>         at
>> org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50)
>>         at org.apache.cassandra.net.MessageIn.read(MessageIn.java:123)
>>     at
>> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:195)
>>         at
>> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:183)
>>         at
>> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:94)*
>>
>> I tried to query the keyspace and got this:
>>
>> node1# cqlsh
>> Connected to Cassandra Cluster at x.x.x.x:9042.
>> [cqlsh 5.0.1 | Cassandra 3.11.5.1 | CQL spec 3.4.4 | Native protocol v4]
>> Use HELP for help.
>> cqlsh> select * from mykeyspace.mytable  where id = 123935;
>> *InvalidRequest: Error from server: code=2200 [Invalid query]
>> message="Keyspace * *mykeyspace  does not exist"*
>>
>> Investigating on each node I found that all the *SStables exist*, so I
>> think data is still there but the keyspace vanished, "magically".
>>
>> Other facts I can tell you are:
>>
>>    - I have been getting Anticompaction errors from 2 nodes due to the
>>    fact the disk was almost full.
>>    - the cluster was online friday
>>    - this morning, Monday, the whole cluster was offline and I noticed
>>    the problem of "missing keyspace"
>>    - During the weekend the cluster has been subject to inserts and
>>    deletes
>>    - I have a 9 node (HDD) Cassandra 3.11 cluster.
>>
>> I really need help on this, how can I restore the cluster?
>>
>> Thank you very much
>> Marco
>>
>>
>>
>>
>>
>>
>>
>>
>>

Reply via email to