Hello everyone,
 4 node Cassandra 0.8.5 cluster with RF =2.
 One node started throwing exceptions in its log:

ERROR 10:02:46,837 Fatal exception in thread Thread[FlushWriter:1317,5,main]
java.lang.RuntimeException: java.lang.RuntimeException: Insufficient disk
space to flush 17296 bytes
        at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.RuntimeException: Insufficient disk space to flush
17296 bytes
        at
org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:714)
        at
org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2301)
        at
org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:246)
        at org.apache.cassandra.db.Memtable.access$400(Memtable.java:49)
        at org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:270)
        at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        ... 3 more

Checked disk and obviously it's 100% full.

How do I recover from this without loosing the data? I've got plenty of
space on the other nodes, so I thought of doing a decommission which I
understand reassigns ranges to the other nodes and replicates data to them.
After that's done I plan on manually deleting the data on the node and then
joining in the same cluster position with auto-bootstrap turned off so that
I won't get back the old data and I can continue getting new data with the
node.

Note, I would like to have 4 nodes in because the other three barely take
the input load alone. These are just long running tests until I get some
better machines.

On strange thing I found is that the data folder on the ndoe that filled up
the disk is 150 GB (as measured with du) while the data folder on all other
3 nodes is 50 GB. At the same time, DataStax OpsCenter shows a size of
around 50GB for all 4 nodes. I though that the node was making a major
compaction at which time it filled up the disk....but even that doesn't
make sense because shouldn't a major compaction just be capable of doubling
the size, not triple-ing it? Doesn anyone know how to explain this behavior?

Thanks,
Alex

Reply via email to