I'm about to extend my two node cluster with four dedicated nodes and
removing one of the old nodes, leaving a five node cluster. The
cluster is in production, but I can spare it to do some stress testing
in the meantime as I'm also interested about my cluster performance. I
can't dedicate the cluster for the test, but the load at day time
should be low enough not to screw with the end results too much. The
results might come in within a few days as we'll get the nodes up -
hopefully my tests will produce something meaningful data which can be
applied to this issue.

I haven't used stress.py yet, any tips on that? Could you, David, send
me the stress.py command line which you used?

 - Juho Mäkinen

On Mon, Jul 19, 2010 at 10:51 PM, David Schoonover
<david.schoono...@gmail.com> wrote:
> Sorry, mixed signals in my response. I was partially replying to suggestions 
> that we were limited by the box's NIC or DC's bandwidth (which is gigabit, no 
> dice there). I also ran the tests with -t50 on multiple tester machines in 
> the cloud with no change in performance; I've now rerun those tests on 
> dedicated hardware.
>
>
>        reads/sec @
> nodes   one client      two clients
> 1       53k             73k
> 2       37k             50k
> 4       37k             50k
>
>
> Notes:
> - All notes from the previous dataset apply here.
> - All clients were reading with 50 processes.
> - Test clients were not co-located with the databases or each other.
> - All machines are in the same DC.
> - Servers showed about 20MB/sec in network i/o for the multi-node clusters, 
> which is well under the max for gigabit.
> - Latency was about 2.5ms/req.
>
>
> At this point, we'd really appreciate it if anyone else could attempt to 
> replicate our results. Ultimately, our goal is to see an increase in 
> throughput given an increase in cluster size.
>
> --
> David Schoonover
>
> On Jul 19, 2010, at 2:25 PM, Stu Hood wrote:
>
>> If you put 25 processes on each of the 2 machines, all you are testing is 
>> how fast 50 processes can hit Cassandra... the point of using more machines 
>> is that you can use more processes.
>>
>> Presumably, for a single machine, there is some limit (K) to the number of 
>> processes that will give you additional gains: above that point, you should 
>> use more machines, each running K processes.
>>
>
>

Reply via email to