Hi Jonathan,
The 7k reads/s is very high, could you please make more explain about your
benchmark?

7000 reads/s makes average latency of each read operation only talks
0.143ms. Consider 2 disks in the benchmark, it may be 0.286ms.

But in most random read applications on very large dataset, OS cache and
Cassandra Key/Row cache is not so effective. So, I guess, maybe for a test
on large dataset (such as 1TB) , random reads, the result may not so good.


On Sat, Jul 17, 2010 at 9:07 PM, Jonathan Ellis <jbel...@gmail.com> wrote:

> On Fri, Jul 16, 2010 at 6:06 PM, Oren Benjamin <o...@clearspring.com>
> wrote:
> > The first goal was to reproduce the test described on spyced here:
> http://spyced.blogspot.com/2010/01/cassandra-05.html
> >
> > Using Cassandra 0.6.3, a 4GB/160GB cloud server (
> http://www.rackspacecloud.com/cloud_hosting_products/servers/pricing) with
> default storage-conf.xml and cassandra.in.sh, here's what I got:
> >
> > Reads: 4,800/s
> > Writes: 9,000/s
> >
> > Pretty close to the result posted on the blog, with a slightly lower
> write performance (perhaps due to the availability of only a single disk for
> both commitlog and data).
>
> You're getting as close as you are because you're comparing 0.6
> numbers with 0.5.  For 0.6 on the test machine used in the blog post
> (quad core, 2 disks, 4GB) we were getting 7k reads and 14k writes.
>
> In our tests we saw a 5-15% performance penalty from adding a
> virtualization layer.  Things like only having a single disk are going
> to stack on top of that.
>
> > The above was single node testing.  I'd expect to be able to add nodes
> and scale throughput.  Unfortunately, I seem to be running into a cap of
> 21,000 reads/s regardless of the number of nodes in the cluster.
>
> This is what I would expect if a single machine is handling all the
> Thrift requests.  Are you spreading the client connections to all the
> machines?
>
> > The disk performance of the cloud servers have been extremely spotty...
> Is this normal for the cloud?
>
> Yes.
>
> >  And if so, what's the solution re Cassandra?
>
> The larger the instance you're using, the closer you are to having the
> entire machine, meaning less other users are competing with you for
> disk i/o.
>
> Of course when you're renting the entire machine's worth, it can be
> more cost-effective to just use dedicated hardware.
>
> > However, Cassandra routes to the nearest node topologically and not to
> the best performing one, so "bad" nodes will always result in high latency
> reads.
>
> Cassandra routes reads around nodes with temporarily poor performance
> in 0.7, btw.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Reply via email to