On 17/09/2007 1:07 PM, Joe Wilson wrote:
--- "D. Richard Hipp" <[EMAIL PROTECTED]> wrote:
I have been struggling with the performance of insertion in sqlite.

Here we have a very simple case :

A table with an integer autoincrement primary key and a text field that is
unique.

CREATE TABLE my (id PRIMARY KEY, url);

CREATE UNIQUE INDEX myurl ON my(url);


My application requires inserting up to 10 million records in batches of
20 thousand records.
For each group of 20000 records, first insert them into a TEMP table.
Call the temp table t1.  Then transfer the records to the main table
as follows:

      INSERT OR IGNORE INTO my(url) SELECT url FROM t1 ORDER BY url;

I had no performance improvement with that temp store staging table
technique in my testing - actually it was slower.

  http://www.mail-archive.com/sqlite-users@sqlite.org/msg22143.html

Mind you, the table I was testing against had 4 indexes, whereas the above table has 2. I also wasn't using "OR IGNORE". There might be a difference.

Just setting pragma cache_size to a huge value and inserting into
the table normally in large batches resulted in better performance in my case. It may have already been mentioned, but having a big database page_size value helps minimize the disk writes as well.

A couple of thoughts:

OTTOMH, time to search index is approx O(D * log(K)) where D = depth of tree and K = number of keys per block, and K ** D is O(N)

So:

(1) Big block means big K and thus small D

(2) Long keys (like URLs!!) means small K and thus big D

Further on point (2), the OP seems unsure about whether his URLs are unique or not. Perhaps storing another column containing a 64-bit hash with an index on that column might be the way to go -- shorter key might might well outweigh the extra cost of checking for duplicates.

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to