Re: Node OOM Problems

Edward Capriolo Thu, 19 Aug 2010 14:38:20 -0700

On Thu, Aug 19, 2010 at 4:49 PM, Wayne <[email protected]> wrote:
> What is my "live set"? Is the system CPU bound given the few statements
> below? This is from running 4 concurrent processes against the node...do I
> need to throttle back the concurrent read/writers?
>
> I do all reads/writes as Quorum. (Replication factor of 3).
>
> The memtable threshold is the default of 256.
>
> All caching is turned off.
>
> The database is pretty small, maybe a few million keys (2-3) in 4 CFs. The
> key size is pretty small. Some of the rows are pretty fat though (fatter
> than I thought). I am saving secondary indexes in separate CFs and those are
> the large rows that I think might be part of the problem. I will restart
> testing turning these off and see if I see any difference.
>
> Would an extra fat row explain repeated OOM crashes in a row? I have finally
> got the system to stabilize relatively and I even ran compaction on the bad
> node without a problem (still no row size stats).
>
> I now have several other nodes flapping with the following single error in
> the cassandra.log
> Error: Exception thrown by the agent : java.lang.NullPointerException
>
> I assume this is an unrelated problem?
>
> Thanks for all of your help!
>
> On Thu, Aug 19, 2010 at 10:26 PM, Peter Schuller
> <[email protected]> wrote:
>>
>> So, these:
>>
>> >  INFO [GC inspection] 2010-08-19 16:34:46,656 GCInspector.java (line
>> > 116) GC
>> > for ConcurrentMarkSweep: 41615 ms, 192522712 reclaimed leaving
>> > 8326856720
>> > used; max is 8700035072
>> [snip]
>> > INFO [GC inspection] 2010-08-19 16:36:00,786 GCInspector.java (line 116)
>> > GC for ConcurrentMarkSweep: 37122 ms, 157488
>> > reclaimed leaving 8342836376 used; max is 8700035072
>>
>> ...show that you're live set is indeed very close to heap maximum, and
>> so concurrent mark/sweep phases run often freeing very little memory.
>> In addition the fact that it seems to take 35-45 seconds to do the
>> concurrent mark/sweep on an 8 gig heap on a modern system suggests
>> that you are probably CPU bound in cassandra at the time (meaning GC
>> is slower).
>>
>> In short you're using too much memory in comparison to the maximum
>> heap size. The expected result is to either get an OOM, or just become
>> too slow due to excessive GC activity (usually the latter followed by
>> the former).
>>
>> Now, the question is what memory is used *for*, and why. First off, to
>> get that out of the way, are you inserting with consistency level
>> ZERO? I am not sure whether it applies to 0.6.4 or not but there used
>> to be an issue involving writes at consistency level ZERO not being
>> throttled at all, meaning that if you threw writes at the system
>> faster than it would handle them, you would accumulate memory use. I
>> don't believe this is a problem with CL.ONE and above, even in 0.6.4
>> (but someone correct me if I'm wrong).
>>
>> (As an aside: I'm not sure whether the behavior was such that it might
>> explain OOM on restart as a result of accumulated commitlogs that get
>> replayed faster than memtable flushing happens. Perhaps not, not
>> sure.)
>>
>> In any case, the most important factors are what you're actually doing
>> with the cluster, but you don't say much about the data. In particular
>> how many rows and colums you're populating it with.
>>
>> The primary users of large amounts of memory in cassandra include
>> (hopefully I'm not missing something major);
>>
>> * bloom filters that are used to efficiently avoid doing I/O on
>> sstables that do not contain relevant data. the size of bloom filters
>> scale linearly with the number of row keys (not columns right? I don't
>> remember). so here we have an expected permanent, but low, memory use
>> as a result of a large database. how large is your database? 100
>> million keys? 1 billion? 10 billion?
>>
>> * the memtables; the currently active memtable and any memtables
>> currently undergoing flushing. the size of these are directly
>> controllable in the configuration file. make sure they are reasonable.
>> (If you're not sure at all, with an 8 gig heap I'd say <= 512 mb is a
>> reasonable recommendation unless you have a reason to make them
>> larger)
>>
>> * row cache and key cache, both controllable in the configuration. in
>> particular the row cache can be huge if you have configured it as
>> such.
>>
>> * to some extent unflushed commitlogs; the commit log rotation
>> threshold controls this. the default value is low enough that it
>> should not be your culprit
>>
>> So the question is what you're usage is like. How many unique rows do
>> you have? How many columns? The data size in and of itself should not
>> matter much to memory use, except of course that extremely large
>> individual values will be relevant to transient high memory use when
>> they are read/written.
>>
>> In general, lacking large row caches and such things, you should be
>> able to have hundreds of millions of entries on an 8 gb heap, assuming
>> reasonably sized keys.
>>
>> --
>> / Peter Schuller
>
>


"live set" is active data. For example I may have 900GB of data on
disk, but at given time X 10GB are in being read/written or
replicated. My "live set" would be the 10 GB

Would an extra fat row explain repeated OOM crashes in a row?

Highly likely. I had a row that was 112 MB and 4,000,000+ columns. It
caused havoc for me. Read what Peter described above. It may not be
the problem but that is a place to start looking.

Re: Node OOM Problems

Reply via email to