So, these: > INFO [GC inspection] 2010-08-19 16:34:46,656 GCInspector.java (line 116) GC > for ConcurrentMarkSweep: 41615 ms, 192522712 reclaimed leaving 8326856720 > used; max is 8700035072 [snip] > INFO [GC inspection] 2010-08-19 16:36:00,786 GCInspector.java (line 116) GC > for ConcurrentMarkSweep: 37122 ms, 157488 > reclaimed leaving 8342836376 used; max is 8700035072
...show that you're live set is indeed very close to heap maximum, and so concurrent mark/sweep phases run often freeing very little memory. In addition the fact that it seems to take 35-45 seconds to do the concurrent mark/sweep on an 8 gig heap on a modern system suggests that you are probably CPU bound in cassandra at the time (meaning GC is slower). In short you're using too much memory in comparison to the maximum heap size. The expected result is to either get an OOM, or just become too slow due to excessive GC activity (usually the latter followed by the former). Now, the question is what memory is used *for*, and why. First off, to get that out of the way, are you inserting with consistency level ZERO? I am not sure whether it applies to 0.6.4 or not but there used to be an issue involving writes at consistency level ZERO not being throttled at all, meaning that if you threw writes at the system faster than it would handle them, you would accumulate memory use. I don't believe this is a problem with CL.ONE and above, even in 0.6.4 (but someone correct me if I'm wrong). (As an aside: I'm not sure whether the behavior was such that it might explain OOM on restart as a result of accumulated commitlogs that get replayed faster than memtable flushing happens. Perhaps not, not sure.) In any case, the most important factors are what you're actually doing with the cluster, but you don't say much about the data. In particular how many rows and colums you're populating it with. The primary users of large amounts of memory in cassandra include (hopefully I'm not missing something major); * bloom filters that are used to efficiently avoid doing I/O on sstables that do not contain relevant data. the size of bloom filters scale linearly with the number of row keys (not columns right? I don't remember). so here we have an expected permanent, but low, memory use as a result of a large database. how large is your database? 100 million keys? 1 billion? 10 billion? * the memtables; the currently active memtable and any memtables currently undergoing flushing. the size of these are directly controllable in the configuration file. make sure they are reasonable. (If you're not sure at all, with an 8 gig heap I'd say <= 512 mb is a reasonable recommendation unless you have a reason to make them larger) * row cache and key cache, both controllable in the configuration. in particular the row cache can be huge if you have configured it as such. * to some extent unflushed commitlogs; the commit log rotation threshold controls this. the default value is low enough that it should not be your culprit So the question is what you're usage is like. How many unique rows do you have? How many columns? The data size in and of itself should not matter much to memory use, except of course that extremely large individual values will be relevant to transient high memory use when they are read/written. In general, lacking large row caches and such things, you should be able to have hundreds of millions of entries on an 8 gb heap, assuming reasonably sized keys. -- / Peter Schuller