Latency SLAs are very much *not* Cassandra’s sweet spot, scaling throughput and 
storage is more where C*’s strengths shine.  If you want just median latency 
you’ll find things a bit more amenable to modeling, but not if you have 2 nines 
and particularly not 3 nines SLA expectations.  Basically, the harder you push 
on the nodes, the more you get sporadic but non-ignorable timing artifacts due 
to garbage collection and IO stalls when the flushing of the writes can choke 
out the disk reads.  Also, running in AWS, you’ll find that noisy neighbors are 
a routine issue no matter what the specifics of your use.

What your actual data model is, and what your patterns of reads and writes are, 
the impact of deletes and TTLs requiring tombstone cleanup, etc., all 
dramatically change the picture.

If you aren’t already aware of it, there is something called cassandra-stress 
that can help you do some experiments. The challenge though is determining if 
the experiments are representative of what your actual usage will be.  Because 
of the GC issues in anything implemented in a JVM or interpreter, it’s pretty 
easy to fall off the cliff of relevance.  TLP wrote an article about some of 
the challenges of this with cassandra-stress:

https://thelastpickle.com/blog/2017/02/08/Modeling-real-life-workloads-with-cassandra-stress.html

Note that one way to not have to care a lot about variable latency is to make 
use of speculative retry.  Basically you’re trading off some of your median 
throughput to help achieve a latency SLA.  The tradeoff benefit breaks down 
when you get to 3 nines.

I’m actually hoping to start on some modeling of what the latency surface looks 
like with different assumptions in the new year, not because I expect the 
specific numbers to translate to anybody else but just to show how the 
underyling dynamics evidence themselves in metrics when C* nodes are under 
duress.

R


From: Fred Habash <fmhab...@gmail.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Tuesday, December 10, 2019 at 9:57 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Predicting Read/Write Latency as a Function of Total Requests & 
Cluster Size

Message from External Sender
I'm looking for an empirical way to answer these two question:

1. If I increase application work load (read/write requests) by some 
percentage, how is it going to affect read/write latency. Of course, all other 
factors remaining constant e.g. ec2 instance class, ssd specs, number of nodes, 
etc.

2) How many nodes do I have to add to maintain a given read/write latency?

Are there are any methods or instruments out there that can help answer these 
que



----------------------------------------
Thank you

Reply via email to