WOW.. so based on your advice, and a test, I disabled the row cache for the table.
The query was instantly 20x faster. so this is definitely an anti-pattern.. I suspect cassandra just tries to read in they entire physical row into memory and since my physical row is rather big.. ha. Well that wasn't very fun :) BIG win though ;) On Tue, Jul 1, 2014 at 6:52 PM, Kevin Burton <bur...@spinn3r.com> wrote: > A work around for this, is the VFS page cache.. basically, disabling > compression, and then allowing the VFS page cache to keep your data in > memory. > > The only downside is the per column overhead. But if you can store > everything in a 'blob' which is optionally compressed, you're generally > going to be ok. > > Kevin > > > On Tue, Jul 1, 2014 at 6:50 PM, Kevin Burton <bur...@spinn3r.com> wrote: > >> so.. caching the *queries* ? >> >> it seems like a better mechanism would be to cache the actually logical >> row…, not the physical row. >> >> Query caches just don't work in production, If you re-word your query, >> or structure it a different way, you get a miss… >> >> In my experience.. query caches have a 0% hit rate. >> >> >> On Tue, Jul 1, 2014 at 6:40 PM, Robert Coli <rc...@eventbrite.com> wrote: >> >>> On Tue, Jul 1, 2014 at 6:06 PM, Kevin Burton <bur...@spinn3r.com> wrote: >>> >>>> you know.. one thing I failed to mention.. .is that this is going into >>>> a "bucket" and while it's a logical row, the physical row is like 500MB … >>>> according to compaction logs. >>>> >>>> is the ENTIRE physical row going into the cache as one unit? That's >>>> definitely going to be a problem in this model. 500MB is a big atomic >>>> unit. >>>> >>> >>> Yes, the row cache is a row cache. It caches what the storage engine >>> calls rows, which CQL calls "partitions." [1] Rows have to be assembled >>> from all of their row fragments in Memtables/SSTables. >>> >>> This is a big part of why the "off-heap" row cache's behavior of >>> invalidation on write is so bad for its overall performance. Updating a >>> single column in your 500MB row invalidates it and forces you to assemble >>> the entire 500MB row from disk. >>> >>> The only valid use case for the current off-heap row cache seems to be : >>> very small, very uniform in size, very hot, and very rarely modified. >>> >>> https://issues.apache.org/jira/browse/CASSANDRA-5357 >>> >>> Is the ticket for replacing the row cache and its unexpected >>> characteristics with something more like an actual query cache. >>> >>> also.. I assume it's having to do a binary search within the physical >>>> row ? >>> >>> >>> Since the column level bloom filter's removal in 1.2, the only way it >>> can get to specific columns is via the index. >>> >>> =Rob >>> [1] https://issues.apache.org/jira/browse/CASSANDRA-6632 >>> >> >> >> >> -- >> >> Founder/CEO Spinn3r.com >> Location: *San Francisco, CA* >> blog: http://burtonator.wordpress.com >> … or check out my Google+ profile >> <https://plus.google.com/102718274791889610666/posts> >> <http://spinn3r.com> >> >> > > > -- > > Founder/CEO Spinn3r.com > Location: *San Francisco, CA* > blog: http://burtonator.wordpress.com > … or check out my Google+ profile > <https://plus.google.com/102718274791889610666/posts> > <http://spinn3r.com> > > -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts> <http://spinn3r.com>