What is my "live set"? Is the system CPU bound given the few statements below? This is from running 4 concurrent processes against the node...do I need to throttle back the concurrent read/writers?
I do all reads/writes as Quorum. (Replication factor of 3). The memtable threshold is the default of 256. All caching is turned off. The database is pretty small, maybe a few million keys (2-3) in 4 CFs. The key size is pretty small. Some of the rows are pretty fat though (fatter than I thought). I am saving secondary indexes in separate CFs and those are the large rows that I think might be part of the problem. I will restart testing turning these off and see if I see any difference. Would an extra fat row explain repeated OOM crashes in a row? I have finally got the system to stabilize relatively and I even ran compaction on the bad node without a problem (still no row size stats). I now have several other nodes flapping with the following single error in the cassandra.log Error: Exception thrown by the agent : java.lang.NullPointerException I assume this is an unrelated problem? Thanks for all of your help! On Thu, Aug 19, 2010 at 10:26 PM, Peter Schuller < peter.schul...@infidyne.com> wrote: > So, these: > > > INFO [GC inspection] 2010-08-19 16:34:46,656 GCInspector.java (line 116) > GC > > for ConcurrentMarkSweep: 41615 ms, 192522712 reclaimed leaving 8326856720 > > used; max is 8700035072 > [snip] > > INFO [GC inspection] 2010-08-19 16:36:00,786 GCInspector.java (line 116) > GC for ConcurrentMarkSweep: 37122 ms, 157488 > > reclaimed leaving 8342836376 used; max is 8700035072 > > ...show that you're live set is indeed very close to heap maximum, and > so concurrent mark/sweep phases run often freeing very little memory. > In addition the fact that it seems to take 35-45 seconds to do the > concurrent mark/sweep on an 8 gig heap on a modern system suggests > that you are probably CPU bound in cassandra at the time (meaning GC > is slower). > > In short you're using too much memory in comparison to the maximum > heap size. The expected result is to either get an OOM, or just become > too slow due to excessive GC activity (usually the latter followed by > the former). > > Now, the question is what memory is used *for*, and why. First off, to > get that out of the way, are you inserting with consistency level > ZERO? I am not sure whether it applies to 0.6.4 or not but there used > to be an issue involving writes at consistency level ZERO not being > throttled at all, meaning that if you threw writes at the system > faster than it would handle them, you would accumulate memory use. I > don't believe this is a problem with CL.ONE and above, even in 0.6.4 > (but someone correct me if I'm wrong). > > (As an aside: I'm not sure whether the behavior was such that it might > explain OOM on restart as a result of accumulated commitlogs that get > replayed faster than memtable flushing happens. Perhaps not, not > sure.) > > In any case, the most important factors are what you're actually doing > with the cluster, but you don't say much about the data. In particular > how many rows and colums you're populating it with. > > The primary users of large amounts of memory in cassandra include > (hopefully I'm not missing something major); > > * bloom filters that are used to efficiently avoid doing I/O on > sstables that do not contain relevant data. the size of bloom filters > scale linearly with the number of row keys (not columns right? I don't > remember). so here we have an expected permanent, but low, memory use > as a result of a large database. how large is your database? 100 > million keys? 1 billion? 10 billion? > > * the memtables; the currently active memtable and any memtables > currently undergoing flushing. the size of these are directly > controllable in the configuration file. make sure they are reasonable. > (If you're not sure at all, with an 8 gig heap I'd say <= 512 mb is a > reasonable recommendation unless you have a reason to make them > larger) > > * row cache and key cache, both controllable in the configuration. in > particular the row cache can be huge if you have configured it as > such. > > * to some extent unflushed commitlogs; the commit log rotation > threshold controls this. the default value is low enough that it > should not be your culprit > > So the question is what you're usage is like. How many unique rows do > you have? How many columns? The data size in and of itself should not > matter much to memory use, except of course that extremely large > individual values will be relevant to transient high memory use when > they are read/written. > > In general, lacking large row caches and such things, you should be > able to have hundreds of millions of entries on an 8 gb heap, assuming > reasonably sized keys. > > -- > / Peter Schuller >