I read through the recent thread "Cassandra mad GC", which seemed very similar to my situation, but didn¹t really help.
Here is what I get from my logs when I grep for GCInspector. Note that this is the middle of the night on a dev server, so there should have been almost no load. INFO [ScheduledTasks:1] 2014-01-29 02:41:16,579 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 341 ms for 1 collections, 8001582816 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:41:29,135 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 350 ms for 1 collections, 8027555576 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:41:41,646 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 364 ms for 1 collections, 8075851136 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:41:54,223 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 375 ms for 1 collections, 8124762400 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:42:24,258 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 22995 ms for 2 collections, 7385470288 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:45:21,328 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 218 ms for 1 collections, 7582480104 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:45:33,418 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 222 ms for 1 collections, 7584743872 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:45:45,527 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 217 ms for 1 collections, 7588514264 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:45:57,594 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 223 ms for 1 collections, 7590223632 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:46:09,686 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 226 ms for 1 collections, 7592826720 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:46:21,867 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 229 ms for 1 collections, 7595464520 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:46:33,869 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 227 ms for 1 collections, 7597109672 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:46:45,962 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 230 ms for 1 collections, 7599909296 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:46:57,964 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 230 ms for 1 collections, 7601584048 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:47:10,018 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 229 ms for 1 collections, 7604217952 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:47:22,136 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 236 ms for 1 collections, 7605867784 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:47:34,277 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 239 ms for 1 collections, 7607521456 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:47:46,292 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 235 ms for 1 collections, 7610667376 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:47:58,537 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 261 ms for 1 collections, 7650345088 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:48:10,783 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 269 ms for 1 collections, 7653016592 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:48:23,786 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 298 ms for 1 collections, 7716831032 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:48:35,988 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 308 ms for 1 collections, 7745178616 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:48:48,434 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 319 ms for 1 collections, 7796207088 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:49:00,902 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 320 ms for 1 collections, 7821378680 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:49:13,344 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 338 ms for 1 collections, 7859905288 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:49:25,471 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 352 ms for 1 collections, 7911145688 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:49:38,473 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 359 ms for 1 collections, 7938204144 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:49:50,895 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 368 ms for 1 collections, 7988088408 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:50:03,345 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 379 ms for 1 collections, 8014757368 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:50:15,920 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 384 ms for 1 collections, 8044074584 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:50:28,484 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 397 ms for 1 collections, 8094146216 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:50:41,040 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 405 ms for 1 collections, 8118675056 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:50:53,628 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 408 ms for 1 collections, 8121201008 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:51:19,816 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 26041 ms for 1 collections, 7385748352 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:59:38,271 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 20970 ms for 1 collections, 7379533664 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 03:00:07,020 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 25835 ms for 1 collections, 7386103248 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 03:00:28,777 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 20278 ms for 1 collections, 7380769872 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 03:00:55,514 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 24909 ms for 1 collections, 7382846344 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 03:01:23,741 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 26223 ms for 1 collections, 7387539176 used; max is 8126464000 There are various long GC¹s sprinkled throughout my logs. Sometimes it recovers from them. Sometime it goes into a death spiral. About a minute prior to the big meltdown at 02:51:19 there was a lot of activity on the local CF. About every 10 milliseconds the following is repeated: INFO [MutationStage:552] 2014-01-29 02:50:35,489 ColumnFamilyStore.java (line 740) Enqueuing flush of Memtable-local@1059338449(73/3672 serialized/live bytes, 2 ops) INFO [FlushWriter:45] 2014-01-29 02:50:35,489 Memtable.java (line 333) Writing Memtable-local@1059338449(73/3672 serialized/live bytes, 2 ops) INFO [FlushWriter:45] 2014-01-29 02:50:35,492 Memtable.java (line 373) Completed flushing /var/lib/cassandra/data/system/local/system-local-jb-104855-Data.db (93 bytes) for commitlog position ReplayPosition(segmentId=1390945513179, position=25932849) It seems strange to be repeatedly flushing a table at that frequency. Other than that, I don¹t see anything odd in the logs just before GC went off the rails. I have not done any tuning. All my JVM settings are default. For the record, here¹s what I get from ps: java -ea -javaagent:/usr/local/apache-cassandra-2.0.4/bin/../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms7830M -Xmx7830M -Xmn800M -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:+UseCondCardMark I have a half-dozen keyspaces with a half-dozen tables per keyspace, and a total of several gigabytes of data. Jut a one-node cluster. Nothing really remarkable. Any tables with records that could get really wide have been sharded. No record should be bigger than a few MB. On an unrelated note, I had to kill my server because it wouldn¹t do anything but GC. Now I cannot start it. It errors out with the following: java.lang.IllegalStateException: Unfinished compactions reference missing sstables. This should never happen since compactions are marked finished before we start removing the old sstables. at org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftover s(ColumnFamilyStore.java:495) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:264) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:4 61) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:504) java.lang.IllegalStateException: Unfinished compactions reference missing sstables. This should never happen since compactions are marked finished before we start removing the old sstables. at org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftover s(ColumnFamilyStore.java:495) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:264) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:4 61) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:504) Exception encountered during startup: Unfinished compactions reference missing sstables. This should never happen since compactions are marked finished before we start removing the old sstables. I have a single node in my cluster. Is there any way to recover, or is my data toast? I can restore my data, so this isn¹t a primary issue for me. The GC issue, on the other hand, is preventing me from moving to production. Robert