On 17/09/2007 1:07 PM, Joe Wilson wrote:
--- "D. Richard Hipp" <[EMAIL PROTECTED]> wrote:
I have been struggling with the performance of insertion in sqlite.
Here we have a very simple case :
A table with an integer autoincrement primary key and a text
field that is
unique.
CREATE TABLE my (id PRIMARY KEY, url);
CREATE UNIQUE INDEX myurl ON my(url);
My application requires inserting up to 10 million records in
batches of
20 thousand records.
For each group of 20000 records, first insert them into a TEMP table.
Call the temp table t1. Then transfer the records to the main table
as follows:
INSERT OR IGNORE INTO my(url) SELECT url FROM t1 ORDER BY url;
I had no performance improvement with that temp store staging table
technique in my testing - actually it was slower.
http://www.mail-archive.com/sqlite-users@sqlite.org/msg22143.html
Mind you, the table I was testing against had 4 indexes, whereas the above
table has 2. I also wasn't using "OR IGNORE". There might be a difference.
Just setting pragma cache_size to a huge value and inserting into
the table normally in large batches resulted in better performance in
my case. It may have already been mentioned, but having a big
database page_size value helps minimize the disk writes as well.
A couple of thoughts:
OTTOMH, time to search index is approx O(D * log(K)) where D = depth of
tree and K = number of keys per block, and K ** D is O(N)
So:
(1) Big block means big K and thus small D
(2) Long keys (like URLs!!) means small K and thus big D
Further on point (2), the OP seems unsure about whether his URLs are
unique or not. Perhaps storing another column containing a 64-bit hash
with an index on that column might be the way to go -- shorter key might
might well outweigh the extra cost of checking for duplicates.
-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------