Hi all,

We have upgraded a couple of clusters from 3.11.6, now we are having issues 
when we restart the nodes.

The node will either hang or take 10-30 minute to restart, these are the last 
messages we have in the system.log:

INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,267  FileUtils.java:545 - 
Deleting file during startup: 
/var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-184-big-Summary.db
INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,268  LogTransaction.java:240 - 
Unfinished transaction log, deleting 
/var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-185-big-Data.db
INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,268  FileUtils.java:545 - 
Deleting file during startup: 
/var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-185-big-Summary.db
INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,269  LogTransaction.java:240 - 
Unfinished transaction log, deleting 
/var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-186-big-Data.db
INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,270  FileUtils.java:545 - 
Deleting file during startup: 
/var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-186-big-Summary.db
INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,272  LogTransaction.java:240 - 
Unfinished transaction log, deleting 
/var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb_txn_unknowncompactiontype_bc501d00-790f-11ec-9f80-85
8854746758.log
INFO  [MemtableFlushWriter:2] 2022-01-19 10:08:23,289  LogTransaction.java:240 
- Unfinished transaction log, deleting 
/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb_txn_flush_bc52dc20-790f-11ec-9f80-858854746758.log

The debug log has messages from DiskBoundaryManager.java at the same time, then 
it just has the following messages:||

DEBUG [ScheduledTasks:1] 2022-01-19 10:28:09,430  SSLFactory.java:354 - 
Checking whether certificates have been updated []
DEBUG [ScheduledTasks:1] 2022-01-19 10:38:09,431  SSLFactory.java:354 - 
Checking whether certificates have been updated []
DEBUG [ScheduledTasks:1] 2022-01-19 10:48:09,431  SSLFactory.java:354 - 
Checking whether certificates have been updated []
DEBUG [ScheduledTasks:1] 2022-01-19 10:58:09,431  SSLFactory.java:354 - 
Checking whether certificates have been updated []


It seems to get worse after each restart, and then it gets to the state where 
it just hangs, then the only thing to do is to re bootstrap the node. 

Once I had re bootstrapped all the nodes in the cluster, I thought the cluster 
was stable, but I have now got the case where the one of the nodes is hanging 
again. 

Does anyone have an ideas what is causing the problems ? 


Thanks 

Paul Chandler

Reply via email to