Hi Jonathan, The 7k reads/s is very high, could you please make more explain about your benchmark?
7000 reads/s makes average latency of each read operation only talks 0.143ms. Consider 2 disks in the benchmark, it may be 0.286ms. But in most random read applications on very large dataset, OS cache and Cassandra Key/Row cache is not so effective. So, I guess, maybe for a test on large dataset (such as 1TB) , random reads, the result may not so good. On Sat, Jul 17, 2010 at 9:07 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > On Fri, Jul 16, 2010 at 6:06 PM, Oren Benjamin <o...@clearspring.com> > wrote: > > The first goal was to reproduce the test described on spyced here: > http://spyced.blogspot.com/2010/01/cassandra-05.html > > > > Using Cassandra 0.6.3, a 4GB/160GB cloud server ( > http://www.rackspacecloud.com/cloud_hosting_products/servers/pricing) with > default storage-conf.xml and cassandra.in.sh, here's what I got: > > > > Reads: 4,800/s > > Writes: 9,000/s > > > > Pretty close to the result posted on the blog, with a slightly lower > write performance (perhaps due to the availability of only a single disk for > both commitlog and data). > > You're getting as close as you are because you're comparing 0.6 > numbers with 0.5. For 0.6 on the test machine used in the blog post > (quad core, 2 disks, 4GB) we were getting 7k reads and 14k writes. > > In our tests we saw a 5-15% performance penalty from adding a > virtualization layer. Things like only having a single disk are going > to stack on top of that. > > > The above was single node testing. I'd expect to be able to add nodes > and scale throughput. Unfortunately, I seem to be running into a cap of > 21,000 reads/s regardless of the number of nodes in the cluster. > > This is what I would expect if a single machine is handling all the > Thrift requests. Are you spreading the client connections to all the > machines? > > > The disk performance of the cloud servers have been extremely spotty... > Is this normal for the cloud? > > Yes. > > > And if so, what's the solution re Cassandra? > > The larger the instance you're using, the closer you are to having the > entire machine, meaning less other users are competing with you for > disk i/o. > > Of course when you're renting the entire machine's worth, it can be > more cost-effective to just use dedicated hardware. > > > However, Cassandra routes to the nearest node topologically and not to > the best performing one, so "bad" nodes will always result in high latency > reads. > > Cassandra routes reads around nodes with temporarily poor performance > in 0.7, btw. > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >