Latency overhead on Cassandra cluster deployed on multiple AZs (AWS)

Alessandro Pieri Mon, 11 Apr 2016 05:57:59 -0700

Hi everyone,

Last week I ran some tests to estimate the latency overhead introduces in a
Cassandra cluster by a multi availability zones setup on AWS EC2.


I started a Cassandra cluster of 6 nodes deployed on 3 different AZs (2
nodes/AZ).

Then, I used cassandra-stress to create an INSERT (write) test of 20M
entries with a replication factor = 3, right after, I ran cassandra-stress
again to READ 10M entries.

Well, I got the following unexpected result:

Single-AZ, CL=ONE -> median/95th percentile/99th percentile:
1.06ms/7.41ms/55.81ms
Multi-AZ, CL=ONE -> median/95th percentile/99th percentile:
1.16ms/38.14ms/47.75ms

Basically, switching to the multi-AZ setup the latency increased of ~30ms.
That's too much considering the the average network latency between AZs on
AWS is ~1ms.

Since I couldn't find anything to explain those results, I decided to run
the cassandra-stress specifying only a single node entry (i.e. "--nodes
node1" instead of "--nodes node1,node2,node3,node4,node5,node6") and
surprisingly the latency went back to 5.9 ms.

Trying to recap:

Multi-AZ, CL=ONE, "--nodes node1,node2,node3,node4,node5,node6" -> 95th
percentile: 38.14ms
Multi-AZ, CL=ONE, "--nodes node1" -> 95th percentile: 5.9ms

For the sake of completeness I've ran a further test using a consistency
level = LOCAL_QUORUM and the test did not show any large variance with
using a single node or multiple ones.

Do you guys know what could be the reason?

The test were executed on a m3.xlarge (network optimized) using the
DataStax AMI 2.6.3 running Cassandra v2.0.15.

Thank you in advance for your help.

Cheers,
Alessandro

Latency overhead on Cassandra cluster deployed on multiple AZs (AWS)

Reply via email to