Hello all, I'm Oren's partner in crime on all this. I've got a few more numbers 
to add.

In an effort to eliminate everything but the scaling issue, I set up a cluster 
on dedicated hardware (non-virtualized; 8-core, 16G RAM). 

No data was loaded into Cassandra -- 100% of requests were misses. This is, so 
far as we can reason about the problem, as fast as the database can perform; 
disk is out of the picture, and the hardware is certainly more than sufficient.

nodes   reads/sec
1       53,000
2       37,000
4       37,000

I ran this test previously on the cloud, with similar results:

nodes   reads/sec
1       24,000
2       21,000
3       21,000
4       21,000
5       21,000
6       21,000

In fact, I ran it twice out of disbelief (on different nodes the second time) 
to essentially identical results. 

Other Notes:
 - stress.py was run in both random and gaussian mode; there was no difference. 
 - Runs were 10+ minutes (where the above number represents an average 
excluding the beginning and the end of the run). 
 - Supplied node lists covered all boxes in the cluster. 
 - Data and commitlog directories were deleted between each run.
 - Tokens were evenly spaced across the ring, and changed to match cluster size 
before each run.

If anyone has explanations or suggestions, they would be quite welcome. This is 
surprising to say the least.

Cheers,

Dave



On Jul 19, 2010, at 11:42 AM, Stu Hood wrote:

> Hey Oren,
> 
> The Cloud Servers REST API returns a "hostId" for each server that indicates 
> which physical host you are on: I'm not sure if you can see it from the 
> control panel, but a quick curl session should get you the answer.
> 
> Thanks,
> Stu
> 
> -----Original Message-----
> From: "Oren Benjamin" <o...@clearspring.com>
> Sent: Monday, July 19, 2010 10:30am
> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Subject: Re: Cassandra benchmarking on Rackspace Cloud
> 
> Certainly I'm using multiple cloud servers for the multiple client tests.  
> Whether or not they are resident on the same physical machine, I just don't 
> know.
> 
>   -- Oren
> 
> On Jul 18, 2010, at 11:35 PM, Brandon Williams wrote:
> 
> On Sun, Jul 18, 2010 at 8:45 PM, Oren Benjamin 
> <o...@clearspring.com<mailto:o...@clearspring.com>> wrote:
> Thanks for the info.  Very helpful in validating what I've been seeing.  As 
> for the scaling limit...
> 
>>> The above was single node testing.  I'd expect to be able to add nodes and 
>>> scale throughput.  Unfortunately, I seem to be running into a cap of 21,000 
>>> reads/s regardless of the number of nodes in the cluster.
>> 
>> This is what I would expect if a single machine is handling all the
>> Thrift requests.  Are you spreading the client connections to all the
>> machines?
> 
> Yes - in all tests I add all nodes in the cluster to the --nodes list.  The 
> client requests are in fact being dispersed among all the nodes as evidenced 
> by the intermittent TimedOutExceptions in the log which show up against the 
> various nodes in the input list.  Could it be a result of all the virtual 
> nodes being hosted on the same physical hardware?  Am I running into some 
> connection limit?  I don't see anything pegged in the JMX stats.
> 
> It's unclear if you're using multiple client machines for stress.py or not, a 
> limitation of 24k/21k for a single quad-proc machine is normal in my 
> experience.
> 
> -Brandon
> 
> 
> 

Reply via email to