Hello everyone, 4 node Cassandra 0.8.5 cluster with RF =2. One node started throwing exceptions in its log:
ERROR 10:02:46,837 Fatal exception in thread Thread[FlushWriter:1317,5,main] java.lang.RuntimeException: java.lang.RuntimeException: Insufficient disk space to flush 17296 bytes at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.RuntimeException: Insufficient disk space to flush 17296 bytes at org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:714) at org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2301) at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:246) at org.apache.cassandra.db.Memtable.access$400(Memtable.java:49) at org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:270) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 3 more Checked disk and obviously it's 100% full. How do I recover from this without loosing the data? I've got plenty of space on the other nodes, so I thought of doing a decommission which I understand reassigns ranges to the other nodes and replicates data to them. After that's done I plan on manually deleting the data on the node and then joining in the same cluster position with auto-bootstrap turned off so that I won't get back the old data and I can continue getting new data with the node. Note, I would like to have 4 nodes in because the other three barely take the input load alone. These are just long running tests until I get some better machines. On strange thing I found is that the data folder on the ndoe that filled up the disk is 150 GB (as measured with du) while the data folder on all other 3 nodes is 50 GB. At the same time, DataStax OpsCenter shows a size of around 50GB for all 4 nodes. I though that the node was making a major compaction at which time it filled up the disk....but even that doesn't make sense because shouldn't a major compaction just be capable of doubling the size, not triple-ing it? Doesn anyone know how to explain this behavior? Thanks, Alex