Multiget performance

Allan C Tue, 08 Apr 2014 23:22:56 -0700

Hi all,

I’ve always been told that multigets are a Cassandra anti-pattern for 
performance reasons. I ran a quick test tonight to prove it to myself, and, 
sure enough, slowness ensued. It takes about 150ms to get 100 keys for my use 
case. Not terrible, but at least an order of magnitude from what I need it to 
be.


So far, I’ve been able to denormalize and not have any problems. Today, I ran 
into a use case where denormalization introduces a huge amount of complexity to 
the code.

It’s very tempting to cache a subset in Redis and call it a day — probably 
will. But, that’s not a very satisfying answer. It’s only about 5GB of data and 
it feels like I should be able to tune a Cassandra CF to be within 2x.

The workload is around 70% reads. Most of the writes are updates to existing 
data. Currently, it’s in an LCS CF with ~30M rows. The cluster is 300GB total 
with 3-way replication, running across 12 fairly large boxes with 16G RAM. All 
on SSDs. Striped across 3 AZs in AWS (hi1.4xlarges, fwiw).


Has anyone had success getting good results for this kind of workload? Or, is 
Cassandra just not suited for it at all and I should just use an in-memory 
store?

-Allan

Multiget performance

Reply via email to