WOW.. so based on your advice, and a test, I disabled the row cache for the
table.

The query was instantly 20x faster.

so this is definitely an anti-pattern.. I suspect cassandra just tries to
read in they entire physical row into memory and since my physical row is
rather big.. ha.  Well that wasn't very fun :)

BIG win though ;)


On Tue, Jul 1, 2014 at 6:52 PM, Kevin Burton <bur...@spinn3r.com> wrote:

> A work around for this, is the VFS page cache.. basically, disabling
> compression, and then allowing the VFS page cache to keep your data in
> memory.
>
> The only downside is the per column overhead.  But if you can store
> everything in a 'blob' which is optionally compressed, you're generally
> going to be ok.
>
> Kevin
>
>
> On Tue, Jul 1, 2014 at 6:50 PM, Kevin Burton <bur...@spinn3r.com> wrote:
>
>> so.. caching the *queries* ?
>>
>> it seems like a better mechanism would be to cache the actually logical
>> row…, not the physical row.
>>
>> Query caches just don't work in production,  If you re-word your query,
>> or structure it a different way, you get a miss…
>>
>> In my experience.. query caches have a 0% hit rate.
>>
>>
>> On Tue, Jul 1, 2014 at 6:40 PM, Robert Coli <rc...@eventbrite.com> wrote:
>>
>>> On Tue, Jul 1, 2014 at 6:06 PM, Kevin Burton <bur...@spinn3r.com> wrote:
>>>
>>>> you know.. one thing I failed to mention.. .is that this is going into
>>>> a "bucket" and while it's a logical row, the physical row is like 500MB …
>>>> according to compaction logs.
>>>>
>>>> is the ENTIRE physical row going into the cache as one unit?  That's
>>>> definitely going to be a problem in this model.  500MB is a big atomic 
>>>> unit.
>>>>
>>>
>>> Yes, the row cache is a row cache. It caches what the storage engine
>>> calls rows, which CQL calls "partitions." [1] Rows have to be assembled
>>> from all of their row fragments in Memtables/SSTables.
>>>
>>> This is a big part of why the "off-heap" row cache's behavior of
>>> invalidation on write is so bad for its overall performance. Updating a
>>> single column in your 500MB row invalidates it and forces you to assemble
>>> the entire 500MB row from disk.
>>>
>>> The only valid use case for the current off-heap row cache seems to be :
>>> very small, very uniform in size, very hot, and very rarely modified.
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-5357
>>>
>>> Is the ticket for replacing the row cache and its unexpected
>>> characteristics with something more like an actual query cache.
>>>
>>> also.. I assume it's having to do a binary search within the physical
>>>> row ?
>>>
>>>
>>> Since the column level bloom filter's removal in 1.2, the only way it
>>> can get to specific columns is via the index.
>>>
>>> =Rob
>>> [1] https://issues.apache.org/jira/browse/CASSANDRA-6632
>>>
>>
>>
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>> <http://spinn3r.com>
>>
>>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
>
>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>

Reply via email to