On 10 Jan 2012, at 8:37am, Durga D wrote:

> I noticed, with first approach, huge performance hit when database grows.
> for ex: database has 500,000 records. now insert 25,000 within a
> transaction. It takes lot of time when compared to insert 25,000 records
> with empty database.
> 
> approximately 15 times slower. because of unique(col1, col2).

This makes some sense.  Every INSERT command involves SQLite looking through 
the records which are already in the table.  So a new insert when there are 
only 100 rows in the table is going to be pretty quick, but an insert when 
there are 500,000 rows in the table will involve lots of work.

SQLite has to look up the data of the new record in the index it makes of 
(col1, col2) so it can find out whether the new row is a duplicate of an 
existing row.  So it has to do 25,000 searches of a balanced tree.  And if the 
tree has 500,000 records in it that means it's doing around ... hmm, it's over 
3/4 of 1020, call it 765 ... tests for each search of the tree, 765 test for 
each INSERT command.

So inserting 25,000 new rows when you already have 500,000 rows involves more 
than 25,000 * 765 string comparisons.  Call it 19 million string comparisons.  
Plus, of course, all the other work involved in messing with the database.  
Since you know how long your strings are you can get an idea of how much data 
SQLite is having to handle to do this.

> I did with transactions for every 25,000 records.

That is a reasonable way to do it.  People may be able to recommend PRAGMAs 
which will speed this up a bit but what you're doing really does involve a lot 
of work for the computer to do.

Simon.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to