Hi,

I'm currently benchmarking Cassandra and have encountered some interesting
behavior. As I increase the number of client threads (and connections),
latency increases as expected but, at some point, throughput actually
decreases.

I've seen a few posts about this online, with no clear resolution:

If we move to higher threadcounts, throughput does not
> increase and even  decreases. Do you have any idea why this is
> happening and possibly suggestions how to scale throughput to much
> higher numbers? [1]


If you want to increase throughput, try increasing the number of clients.
> Of course, it doesnt mean that throughtput will always increase. My
> observation was that it will increase and after certain number of clients
> throughput decrease again. [2]


You can see a graph of the behavior I'm experiencing here:
https://dl.dropbox.com/u/34647904/cassandra-lat-thru.pdf

I'm using YCSB on EC2 with one m1.large instance to drive client load and
one m1.large instance for a single Cassandra node with maximum connections
set to 1024 and with Cassandra's files on RAID0 ephemeral storage. This
problem occurs when commitlog sync is both batch and periodic, with HSHA
and sync on, and with a variety of heapsize settings. As far as I can tell,
this isn't due to GC and nodetool tpstats isn't showing any dropped
requests or even serious queuing. Any thoughts?

My guess is that this reflects some sort of overhead due to the extra
connections--perhaps something due to context switching?

Thanks,
Peter

[1]
http://mail-archives.apache.org/mod_mbox/cassandra-user/201102.mbox/%3C12ECB704F2665F40A9C09018C73D95AEC92A8F3618@IE2RD2XVS011.red002.local%3E
[2]
http://grokbase.com/t/cassandra/user/127h25p3hy/cassandra-evaluation-benchmarking-throughput-not-scaling-as-expected-neither-latency-showing-good-numbers#20120718x3cpg6enq250gbjg19ns14678g
[3] Example Bash script: https://gist.github.com/3978273

Reply via email to