Hello everyone, I wanted to tell you about some performance
benchmarking we have done with Cassandra running in EC2 on a virtual
network.

The purpose of the experiment was to see how running Cassandra on a
virtual network could simplify operational complexity and to determine
the performance impact, relative to native interfaces.

The summary results for running a 4 node cluster are:

Cassandra Performance on vCider Virtual Network
Replication Factor 1           32     64       128      192       256 byte cols.
v. Unencrypted:              -8.2%  0.8%   -2.3%    -2.3%   -6.7%
v. Encrypted:                 63.8% 55.4%  60.0%   53.9%   61.7%
v. Node Only Encryption: -0.7% -5.0%    1.9%    5.4%    4.7%

Replication Factor 3         32     64        128       192       256 byte cols
v. Unencrypted:              -4.5% -4.7%   -5.8%     -4.5%    -1.5%
v. Encrypted:                 31.5% 29.6%  31.4%     27.3%   29.9%
v. Node Only Encryption: 3.8%   3.9%   6.1%        8.3%   4.0%

There is tremendous EC2 performance variability and our experiments
tried to adjust for that by running 10 trials for each column size and
averaging them. Averaged across all column widths, the performance
was:

Replication Factor 1
v. Unencrypted:                        -3.7%
v. Encrypted:                           +59%
v. Node Only Encryption:         +1.3%

Replication Factor 3
v. Unencrypted:                        -4.2%
v. Encrypted:                            +30%
v. Node Only Encryption:         +5.2%

As you might expect, the performance while running on a virtual
network was slower than running on the native interfaces.

However, when you encrypt communications (both node and client) the
performance of the virtual network was faster by nearly 60% (30% with
R3). Since this measurement is primarily an indication of the client
encryption performance, we also measured performance of the somewhat
unrealistic configuration when only node communications were
encrypted.  Here the virtual network performed better as well.

The overall decrease performance loss -3.7% to -4.2% for un-encrypted
R1 v. R3 is understandable since R3 is more network intensive than R1.
However, since the virtual network performs encryption in the kernel
(which seems to be faster than what Cassandra can do natively) when
encryption is turned on, the performance gains are greater with R3
since more data needs to be encrypted.

We ran the tests using the Cassandra stress test tool across a range
of column widths, replication strategies and consistency levels (One,
Quourm).  We used OpenVPN for client encryption. The complete test
results are attached.

I’m going to write up a more complete analysis of these results, but
wanted to share them with you to see if there was anything obvious
that we overlooked.  We are currently running experiments against
clusters running in multiple EC2 regions.

We expect similar performance characteristics across regions, but with
the added benefit of not needing to fuss with the EC2 snitch. The
virtual network lets you assign your own private IPs for all Cassandra
interfaces so the standard Snitch can be used everywhere.

If you're running Cassandra in EC2 (or any other public cloud) and
want encrypted communications, running on virtual network is a clear
winner.  Here, not only is it 30-60% faster, but you don’t have to
bother with the point-to-point configurations of setting up a third
party encryption technique. Since these run in user space, its not
surprising that dramatic performance gains can be achieved with the
kernel based approach of the virtual network.

When we’re done will put everything in a public repo that includes all
Puppet configuration modules as well as collection of scripts that
automate nearly all of the testing. I hope to have that in the next
week or so, but wanted to get some of these single region results out
there in advance.

If you are interested, you can learn more about the vCider virtual
network at www.vcider.com

Let me know if you have any questions.
CM

Attachment: vCider.Cassandra.benchmarks.pdf
Description: Adobe PDF document

Reply via email to