Hi Sylvain, thanks for your answer.
I'd make a test with the stress utility inserting 100 000 rows with 10 columns per row I use these options: -o insert -t 5 -n 100000 -c 10 -d 192.168.1.210,192.168.1.211,... result: 161 seconds with MySQL using inserts (after a dump): 1.79 second Charles 2011/5/3 Sylvain Lebresne <sylv...@datastax.com> > There is probably a fair number of things you'd have to make sure you do to > improve the write performance on the Cassandra side (starting by using > multiple > threads to do the insertion), but the first thing is probably to start > comparing things > that are at least mildly comparable. If you do inserts in Cassandra, > you should try > to do inserts in MySQL too, not "load data infile" (which really is > just a bulk loading > utility). And as stated here > http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html: > "When loading a table from a text file, use LOAD DATA INFILE. This is > usually 20 times > faster than using INSERT statements." > > -- > Sylvain > > On Tue, May 3, 2011 at 12:30 PM, charles THIBAULT > <charl.thiba...@gmail.com> wrote: > > Hello everybody, > > > > first: sorry for my english in advance!! > > > > I'm getting started with Cassandra on a 5 nodes cluster inserting data > > with the pycassa API. > > > > I've read everywere on internet that cassandra's performance are better > than > > MySQL > > because of the writes append's only into commit logs files. > > > > When i'm trying to insert 100 000 rows with 10 columns per row with batch > > insert, I'v this result: 27 seconds > > But with MySQL (load data infile) this take only 2 seconds (using > indexes) > > > > Here my configuration > > > > cassandra version: 0.7.5 > > nodes : 192.168.1.210, 192.168.1.211, 192.168.1.212, 192.168.1.213, > > 192.168.1.214 > > seed: 192.168.1.210 > > > > My script > > > ************************************************************************************************************* > > #!/usr/bin/env python > > > > import pycassa > > import time > > import random > > from cassandra import ttypes > > > > pool = pycassa.connect('test', ['192.168.1.210:9160']) > > cf = pycassa.ColumnFamily(pool, 'test') > > b = cf.batch(queue_size=50, > > write_consistency_level=ttypes.ConsistencyLevel.ANY) > > > > tps1 = time.time() > > for i in range(100000): > > columns = dict() > > for j in range(10): > > columns[str(j)] = str(random.randint(0,100)) > > b.insert(str(i), columns) > > b.send() > > tps2 = time.time() > > > > > > print("execution time: " + str(tps2 - tps1) + " seconds") > > > ************************************************************************************************************* > > > > what I'm doing rong ? > > >