Re: Multiget performance

DuyHai Doan Thu, 10 Apr 2014 01:27:28 -0700

As far  as I understood, the multiget performance is bound to the slowest
node responding to the coordinator.


If you are fetching 100 partitions within *n* nodes, the coordinator will
issue requests to those nodes and wait until all the responses are given
back before returning the results to the client.

 Consequently if one node among *n* is under heavy load and takes longer to
respond, it will impact greatly the response time of your multiget.

Now, with the introduction of the recent rapid read protection, this
behavior might be mitigated

 Regards

 Duy Hai DOAN


On Thu, Apr 10, 2014 at 12:52 AM, Tyler Hobbs <ty...@datastax.com> wrote:

> Can you trace the query and paste the results?
>
>
> On Wed, Apr 9, 2014 at 11:17 AM, Allan C <alla...@gmail.com> wrote:
>
>> As one CQL statement:
>>
>>  SELECT * from Event WHERE key IN ([100 keys]);
>>
>> -Allan
>>
>> On April 9, 2014 at 12:52:13 AM, Daniel Chia (danc...@coursera.org)
>> wrote:
>>
>> Are you making the 100 calls in serial, or in parallel?
>>
>> Thanks,
>> Daniel
>>
>>
>> On Tue, Apr 8, 2014 at 11:22 PM, Allan C <alla...@gmail.com> wrote:
>>
>>>  Hi all,
>>>
>>>  I've always been told that multigets are a Cassandra anti-pattern for
>>> performance reasons. I ran a quick test tonight to prove it to myself, and,
>>> sure enough, slowness ensued. It takes about 150ms to get 100 keys for my
>>> use case. Not terrible, but at least an order of magnitude from what I need
>>> it to be.
>>>
>>>  So far, I've been able to denormalize and not have any problems. Today,
>>> I ran into a use case where denormalization introduces a huge amount of
>>> complexity to the code.
>>>
>>>  It's very tempting to cache a subset in Redis and call it a day --
>>> probably will. But, that's not a very satisfying answer. It's only about
>>> 5GB of data and it feels like I should be able to tune a Cassandra CF to be
>>> within 2x.
>>>
>>>  The workload is around 70% reads. Most of the writes are updates to
>>> existing data. Currently, it's in an LCS CF with ~30M rows. The cluster is
>>> 300GB total with 3-way replication, running across 12 fairly large boxes
>>> with 16G RAM. All on SSDs. Striped across 3 AZs in AWS (hi1.4xlarges, fwiw).
>>>
>>>
>>> Has anyone had success getting good results for this kind of workload?
>>> Or, is Cassandra just not suited for it at all and I should just use an
>>> in-memory store?
>>>
>>>  -Allan
>>>
>>
>>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>

Re: Multiget performance

Reply via email to