On Fri, May 23, 2008 at 6:37 PM, Nemanja Čorlija <[EMAIL PROTECTED]> wrote:
> On Fri, May 23, 2008 at 4:20 PM, Stefan Arentz <[EMAIL PROTECTED]> wrote:
>> I have an interesting problem. I need to generate a large table
>> periodically. The table contains a unique SHA1 hash code and 6 integer
>> values and has about 6 million rows. Generating this table is fast. I
>> can set it up in less than 90 seconds on a slow iMac (2.16 Ghz and
>> slow disk). The thing that takes a (relatively) long time is the index
>> creation on the unique hash code .. 720 seconds.
>>
>> (I'm importing within transactions with chunks of 25.000 records)
>>
>> The question is, is there any way to speed up the indexing proces?
>>
>> Couple of things on my mind:
>>
>> Doing this all on an in-memory database takes about 150 seconds in
>> total. Is it possible to build a database in memory and then dump it
>> to disk?
>>
>> Would it make sense to sort the records that i import? Could that
>> result in a quicker index operation? This certainly helps for Berkeley
>> DB. But it is lower level and you can use it's internal hashing
>> functions on your data to pre-sort it. Maybe SQLite has something
>> similar?
>>
>
> It does make sense to presort records before inserting into on-disk
> db. You should insert into memory db first and then insert sorted
> records into disk db from there. You can batch this into chunks of 25K
> rows as you're doing now. But if you have enough memory, it will
> probably be better to increase number of rows.
> You should also play with increasing cache_size and page_size values.
> Pseudo code:
>
> OPEN :MEMORY:
> ATTACH DISK_DB;
>
> DO
>  BEGIN;
>  INSERT INTO MEMORY_TABLE VALUES(...); x 25.000
>  INSERT INTO DISK_TABLE SELECT * FROM MEMORY_TABLE ORDER BY sha1_hash;
>  COMMIT;
> WHILE has_more_rows

This is a great tip. I implemented the above and it works very well.
The load speed on Ubuntu Hardy went down from an hour+ to 6 minutes.

I'm now experimenting with the pragmas to see if this can go even more faster.

 S.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to