Edward (and Maxim),

I agree.  I was just recalling previous performance bake-offs (for other 
technologies, long time ago, galaxy far far away) in which the customer had put 
together a mockup of the high throughput expected in production and wanted to 
make a decision against that one set of numbers.  We always found that both/all 
competing products could be made to run faster due to unexpected factors in the 
non-production test build.  For our side, we always started simple and built up 
the throughput until we found a bottleneck.  We fixed the bottleneck. Rinse and 
repeat.

Chris Gerken

chrisger...@mindspring.com
512.587.5261
http://www.linkedin.com/in/chgerken



On Jan 22, 2012, at 8:51 AM, Edward Capriolo wrote:

> In some sense 1 for one performance "almost" does not matter. Thou I bet you 
> can get Cassandra better (I remember old school ycsb white paper benches 
> against a sharded mysql). 
> 
> One of the main bullet points of Cassandra is if you want to grow from 4 
> nodes, to 8 nodes, to 14 nodes, and so on, Cassandra is elastic and supports 
> online adding and removing of nodes. A do-it-yourself hash mod this algorithm 
> really has no upgrade path
> 
> Edward
> 
> On Sun, Jan 22, 2012 at 9:26 AM, Chris Gerken <chrisger...@mindspring.com> 
> wrote:
> Howdy Gustavo,
> 
> One thing that jumped out at me is your having put two cassandra images on 
> the same box.  There may be enough CPU and memory for the two images combined 
> but you may be seeing some other resource not being shared so nicely - 
> network card bandwidth, for example.
> 
> More generally, the real question is what the bottleneck is (for both db's, 
> actually).  Start with Cassandra running in that configuration and start with 
> one client thread sending one request a second.  Look at the CPU, network and 
> memory metrics for all boxes (including the client).  Nothing should be even 
> close to maxing out that that throughout.  Now incrementally increase one of 
> the test parameters (number of clients or number of inserts per second) just 
> a bit (say from one transaction to 5) and note the above metrics.  Keep 
> slowly increasing the test parameters, one at a time, until one of the 
> metrics maxes out.  That's the bottleneck you're wondering about.  Fix that 
> and the db, be it Cassandra or MySQL) will move ahead of the other 
> performance-wise.  Turn your attention to the other db and repeat.
> 
> - Chris Gerken
> 
> On Jan 22, 2012, at 7:10 AM, Gustavo Gustavo wrote:
> 
>> Hello,
>> 
>> I've set up a testing evironment for Cassandra and MySQL, to compare both, 
>> regarding *performance only*. And I must admit that I was expecting 
>> Cassandra to beat MySQL. But I've not seen this happening up to now.
>> My application/use case is INSERT intensive, since I'm not updating 
>> anything, just inserting all the time.
>> To compare both I created virtual machines with Ubuntu 11.10, and installed 
>> the latest versions of each datastore. Each VM has 1GB of RAM. I've used VMs 
>> as a way to give both datastores an equal sandbox.
>> MySQL is set up to work as sharded, with 2 databases, that means that 
>> records are inserted to a specific instance based on key % 2. The engine is 
>> MyISAM (InnoDB was really slow and not really needed to my case). There's a 
>> primary compound key (integer and datetime columns) in this test table.
>> Let's name the "nodes" MySQL1 and MySQL2.
>> Cassandra is set up to work with 4 nodes, with keys (tokens) set up to 
>> distribute records evenly across the 4 nodes (nodetool ring reports 25% to 
>> each node), replication factor 1 and RandomPartitioner, the other configs 
>> are left to default. Let's name the nodes Cassandra1, Cassandra2, Cassandra3 
>> and Cassandra4.
>> 
>> I'm using 2 physical machines (Windows7) to host the 4 (Cassandra) or 2 
>> (MySQL) virtual machines, this way:
>> Machine1: MySQL1, Cassandra1, Cassandra3
>> Machine2: MySQL2, Cassandra2, Cassandra4
>> The machines have CPU and RAM enough to host Cassandra Cluster or MySQL 
>> "Cluster" at a time.
>> 
>> The client test applicatin is running in a third physical machine, with 8 
>> threads doing inserts. The test application is written in C# (Windows7) 
>> using Aquiles high-level client.
>> 
>> My use case is a vehicle tracking system. So, let's suppose, from minute to 
>> minute, the vehicle sends its position together with some other GPS data and 
>> vehicle status information. The columns in my Cassandra cluster are just the 
>> DateTime (long value) of a position for a specific vehicle, and the value is 
>> all the other data serialized to binary format. Therefore, my CF really 
>> grows in columns number. So all data is inserted only to one CF/Table named 
>> Positions. The key to Cassandra is the VehicleID and to MySQL VehicleID + 
>> PositionDateTime (MySQL creates an index to this automatically). Important 
>> to note that MySQL threw tons of connection exceptions, even though, the 
>> insert was retried until it got through MySQL.
>> 
>> My test case was to insert 1k positions for 1k vehicles to 10 days - which 
>> gives 10.000.000 of inserts.
>> 
>> The final thoughtput that my application had for this scenario was:
>> 
>> Cassandra x 4
>> 2012-01-21 11:45:38,044 #6         [Logger.Log] INFO  - >> Inserted 10000 
>> positions for 1000 vehicles (10000000 inserts): 
>> 2012-01-21 11:45:38,082 #6         [Logger.Log] INFO  - >> Total Time: 
>> 2:37:03,359
>> 2012-01-21 11:45:38,085 #6         [Logger.Log] INFO  - >> Throughput: 1061 
>> inserts/s
>> 
>> And for MySQL x 2
>> 2012-01-21 14:26:25,197 #6         [Logger.Log] INFO  - >> Inserted 10000 
>> positions for 1000 vehicles (10000000 inserts): 
>> 2012-01-21 14:26:25,250 #6         [Logger.Log] INFO  - >> Total Time: 
>> 2:06:25,914
>> 2012-01-21 14:26:25,263 #6         [Logger.Log] INFO  - >> Throughput: 1318 
>> inserts/s
>> 
>> Is there something that I'm missing here? Is this excepted? Or the problem 
>> is somewhere else and that's hard to say looking at this description?
>> 
>> Cheers,
>> Gustavo
>> 
> 
> 

Reply via email to