We have just finished testing the same scenario with MySql at amazingly they continued to insert 1500-3000 rows per second even when the database had 60,000,000 records.
I don't know how this magic is done, but that what the tests showed (with the same machine / OS).
There is one thing I don't understand. Sqlite works in optimistic mode, which means the changes are written to the database and the original data is backup.
In this case I would expect that the "commit" will be fast as deleting a file. But the test shows that the commit can take up to 5 minutes when the database is large.
Does any body has an explanation ?
I saw the comments about using OS feature to solve the problem. Personally if we will choose Sqlite, we need it for lots of platforms and for product installation. Demanding us as users of Sqlite to start configuring journaling on the customer's computers is not practical.
Personally I'm a big fan of Sqlite (great tool), and I really hope that this issue can be solved (so we can choose Sqlite).
I understand that solving this issue demands big changes, but I believe this will take Sqlite to the "big players league".
Thanks Avner
D. Richard Hipp wrote:
Avner Levy wrote:
Hi,
I've written the following program to test the sqlite performance for a specific scenario I have.
I've used most tricks I've found but still when the database gets big the performance gets unacceptable.
The performance problem happens of course only if the indexes are defined.
Since I need the indexes, is there any other option that can make it faster ?
When the database gets big the insert rate gets to 50 rows per second.
Thanks in advance.
* It seems that the larger the DB gets, the journal copies more and more pages out of the database into the file, which kills performance.
Each transaction in your test involves 30000 INSERTs into a single table defined (roughly) as follows:
CREATE TABLE test(a,b); CREATE INDEX idx1 ON test(a); CREATE INDEX idx2 ON test(b);
In these INSERTs, the values for test.a are non-decreasing. That means that new entries into the "test" table and into the "idx1" index always go into the same place in their B*Trees - specifically at the end. But the values for test.b are random, which means that entries into idx2 are distributed throughout its B*Tree. Because changes to test and idx1 are localized, only a handful of disk blocks are changed (many others are added, but few existing ones are changed) but the changes to idx2 are not localized, which means that nearly every disk block associated with idx2 must be changed. Modifying the (thousands) of disk blocks associated with idx2 is what is taking so long. I do not know anything I can do inside SQLite to make it go any faster. I do not know of any other way of creating an index that could work around this issue.
If you disable idx2, you will see that the inserts go much faster. I
also observe that the journal file is about 1/3rd the size of the database,
which is consistent with the observation that every disk block associated
with idx2 needs to be changed. My TCL test script for this situation
is appended to this message.
If you really are doing millions of INSERTs prior to doing any SELECTs, you might consider deferring the creation of idx2 until after all the data has been put into the database. Generally speaking, it is a little faster to create an index for existing data than it is to create the index at the same time that the data is being inserted. But the difference is not that great. And, of course, this won't help if in reality you need to do some SELECTs along the way...
I'm curious to know how other database engines deal with this problem. Have you tried a similar experiment on MySQL, or PostgreSQL, or even Oracle? How did they perform?
Can any readers suggest ways that I have not thought of for making large numbers of non-localized INSERTs go faster?
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]