On Thu, Aug 19, 2010 at 4:49 PM, Wayne <wav...@gmail.com> wrote: > What is my "live set"? Is the system CPU bound given the few statements > below? This is from running 4 concurrent processes against the node...do I > need to throttle back the concurrent read/writers? > > I do all reads/writes as Quorum. (Replication factor of 3). > > The memtable threshold is the default of 256. > > All caching is turned off. > > The database is pretty small, maybe a few million keys (2-3) in 4 CFs. The > key size is pretty small. Some of the rows are pretty fat though (fatter > than I thought). I am saving secondary indexes in separate CFs and those are > the large rows that I think might be part of the problem. I will restart > testing turning these off and see if I see any difference. > > Would an extra fat row explain repeated OOM crashes in a row? I have finally > got the system to stabilize relatively and I even ran compaction on the bad > node without a problem (still no row size stats). > > I now have several other nodes flapping with the following single error in > the cassandra.log > Error: Exception thrown by the agent : java.lang.NullPointerException > > I assume this is an unrelated problem? > > Thanks for all of your help! > > On Thu, Aug 19, 2010 at 10:26 PM, Peter Schuller > <peter.schul...@infidyne.com> wrote: >> >> So, these: >> >> > INFO [GC inspection] 2010-08-19 16:34:46,656 GCInspector.java (line >> > 116) GC >> > for ConcurrentMarkSweep: 41615 ms, 192522712 reclaimed leaving >> > 8326856720 >> > used; max is 8700035072 >> [snip] >> > INFO [GC inspection] 2010-08-19 16:36:00,786 GCInspector.java (line 116) >> > GC for ConcurrentMarkSweep: 37122 ms, 157488 >> > reclaimed leaving 8342836376 used; max is 8700035072 >> >> ...show that you're live set is indeed very close to heap maximum, and >> so concurrent mark/sweep phases run often freeing very little memory. >> In addition the fact that it seems to take 35-45 seconds to do the >> concurrent mark/sweep on an 8 gig heap on a modern system suggests >> that you are probably CPU bound in cassandra at the time (meaning GC >> is slower). >> >> In short you're using too much memory in comparison to the maximum >> heap size. The expected result is to either get an OOM, or just become >> too slow due to excessive GC activity (usually the latter followed by >> the former). >> >> Now, the question is what memory is used *for*, and why. First off, to >> get that out of the way, are you inserting with consistency level >> ZERO? I am not sure whether it applies to 0.6.4 or not but there used >> to be an issue involving writes at consistency level ZERO not being >> throttled at all, meaning that if you threw writes at the system >> faster than it would handle them, you would accumulate memory use. I >> don't believe this is a problem with CL.ONE and above, even in 0.6.4 >> (but someone correct me if I'm wrong). >> >> (As an aside: I'm not sure whether the behavior was such that it might >> explain OOM on restart as a result of accumulated commitlogs that get >> replayed faster than memtable flushing happens. Perhaps not, not >> sure.) >> >> In any case, the most important factors are what you're actually doing >> with the cluster, but you don't say much about the data. In particular >> how many rows and colums you're populating it with. >> >> The primary users of large amounts of memory in cassandra include >> (hopefully I'm not missing something major); >> >> * bloom filters that are used to efficiently avoid doing I/O on >> sstables that do not contain relevant data. the size of bloom filters >> scale linearly with the number of row keys (not columns right? I don't >> remember). so here we have an expected permanent, but low, memory use >> as a result of a large database. how large is your database? 100 >> million keys? 1 billion? 10 billion? >> >> * the memtables; the currently active memtable and any memtables >> currently undergoing flushing. the size of these are directly >> controllable in the configuration file. make sure they are reasonable. >> (If you're not sure at all, with an 8 gig heap I'd say <= 512 mb is a >> reasonable recommendation unless you have a reason to make them >> larger) >> >> * row cache and key cache, both controllable in the configuration. in >> particular the row cache can be huge if you have configured it as >> such. >> >> * to some extent unflushed commitlogs; the commit log rotation >> threshold controls this. the default value is low enough that it >> should not be your culprit >> >> So the question is what you're usage is like. How many unique rows do >> you have? How many columns? The data size in and of itself should not >> matter much to memory use, except of course that extremely large >> individual values will be relevant to transient high memory use when >> they are read/written. >> >> In general, lacking large row caches and such things, you should be >> able to have hundreds of millions of entries on an 8 gb heap, assuming >> reasonably sized keys. >> >> -- >> / Peter Schuller > >
"live set" is active data. For example I may have 900GB of data on disk, but at given time X 10GB are in being read/written or replicated. My "live set" would be the 10 GB Would an extra fat row explain repeated OOM crashes in a row? Highly likely. I had a row that was 112 MB and 4,000,000+ columns. It caused havoc for me. Read what Peter described above. It may not be the problem but that is a place to start looking.