Ok, I figured something like that.  Switching to
ConcurrentLinkedHashCacheProvider I see it is a lot better, but still
instead of the 25-30ms response times I enjoyed with no caching, I'm
seeing 500ms at 100% hit rate on the cache.  No old gen pressure at all,
just ParNew crazy.

More info on my use case is that I am picking 50 columns from the 70k.
Since the whole row is in the cache, and no copying from off-heap nor disk
buffers, seems like it should be faster than non-cache mode.

More thoughts :)

On 11/18/11 6:39 AM, "Sylvain Lebresne" <sylv...@datastax.com> wrote:

>On Fri, Nov 18, 2011 at 1:53 AM, Todd Burruss <bburr...@expedia.com>
>wrote:
>> I'm using cassandra 1.0.  Been doing some testing on using cass's cache.
>>  When I turn it on (using the CLI) I see ParNew jump from 3-4ms to
>> 200-300ms.  This really screws with response times, which jump from
>>~25-30ms
>> to 1300+ms.  I've increase new gen and that helps, but still this is
>> suprising to me, especially since 1.0 defaults to the
>> SerializingCacheProvider ­ off heap.
>> The interesting tid bit is that I have wide rows.  70k+ columns per
>>row, ~50
>> bytes per column value.  The cache only must be about 400 rows to catch
>>all
>> the data per node and JMX is reporting 100% cache hits.  Nodetool ring
>> reports < 2gb per node, my heap is 6gb and total RAM is 16gb.
>> Thoughts?
>
>You're problem is the mix of wide rows and the serializing cache.
>What happens with the serializing cache is that our data is stored
>out of the heap. But that means that for each read to a row, we
>'deserialize' the row for the out-of-heap memory into the heap to
>return it. The thing is, when we do that, we do the full row each
>time. In other word, for each query we deserialize 70k+ columns
>even if to return only one. I'm willing to bet this is what is killing
>your response time. If you want to cache wide rows, I really
>suggest you're using the ConcurrentLinkedHashCacheProvider
>instead.
>
>I'll also note that this explain the ParNew times too. Deserializing
>all those columns from off-heap creates lots of short-lived object,
>and since you deserialize 70k+ on each query, that's quite some
>pressure on the new gen. Note that the serializing cache is
>actually minimizing the use of old gen, because that is the one
>that is the one that can create huge GC pauses with big heap,
>but it actually put more pressure on the new gen. This is by
>design and because new gen is much less of a problem than
>old gen.
>
>--
>Sylvain

Reply via email to