Hi everyone,

Last week I ran some tests to estimate the latency overhead introduces in a
Cassandra cluster by a multi availability zones setup on AWS EC2.

I started a Cassandra cluster of 6 nodes deployed on 3 different AZs (2
nodes/AZ).

Then, I used cassandra-stress to create an INSERT (write) test of 20M
entries with a replication factor = 3, right after, I ran cassandra-stress
again to READ 10M entries.

Well, I got the following unexpected result:

Single-AZ, CL=ONE -> median/95th percentile/99th percentile:
1.06ms/7.41ms/55.81ms
Multi-AZ, CL=ONE -> median/95th percentile/99th percentile:
1.16ms/38.14ms/47.75ms

Basically, switching to the multi-AZ setup the latency increased of ~30ms.
That's too much considering the the average network latency between AZs on
AWS is ~1ms.

Since I couldn't find anything to explain those results, I decided to run
the cassandra-stress specifying only a single node entry (i.e. "--nodes
node1" instead of "--nodes node1,node2,node3,node4,node5,node6") and
surprisingly the latency went back to 5.9 ms.

Trying to recap:

Multi-AZ, CL=ONE, "--nodes node1,node2,node3,node4,node5,node6" -> 95th
percentile: 38.14ms
Multi-AZ, CL=ONE, "--nodes node1" -> 95th percentile: 5.9ms

For the sake of completeness I've ran a further test using a consistency
level = LOCAL_QUORUM and the test did not show any large variance with
using a single node or multiple ones.

Do you guys know what could be the reason?

The test were executed on a m3.xlarge (network optimized) using the
DataStax AMI 2.6.3 running Cassandra v2.0.15.

Thank you in advance for your help.

Cheers,
Alessandro

Reply via email to