Back in 2.0.4 or 2.0.5 I ran into a problem with delete-only workloads. If I did lots of deletes and no upserts, Cassandra would report that the memtable was 0 bytes because an accounting error. The memtable would never flush and Cassandra would eventually die. Someone was kind enough to create a patch, which seemed to have fixed the problem, but last night it reared its ugly head.
I’m now running 2.0.14. I ran a cleanup process on my cluster (10 nodes, RF=3, CL=1). The workload was pretty light, because this cleanup process is single-threaded and does everything synchronously. It was performing 4 reads per second and about 3000 deletes per second. Over the course of many hours, heap slowly grew on all nodes. CPU utilization also increased as GC consumed an ever-increasing amount of time. Eventually a couple of nodes shed 3.5 GB of their 7.5 GB. Other nodes weren’t so fortunate and started flapping due to 30 second GC pauses. The workaround is pretty simple. This cleanup process can simply write a dummy record with a TTL periodically so that Cassandra can flush its memtables and function properly. However, I think this probably ought to be fixed. Delete-only workloads can’t be that rare. I can’t be the only one that needs to go through and cleanup their tables. Robert