On Fri, May 23, 2008 at 4:20 PM, Stefan Arentz <[EMAIL PROTECTED]> wrote:
> I have an interesting problem. I need to generate a large table
> periodically. The table contains a unique SHA1 hash code and 6 integer
> values and has about 6 million rows. Generating this table is fast. I
> can set it up in less than 90 seconds on a slow iMac (2.16 Ghz and
> slow disk). The thing that takes a (relatively) long time is the index
> creation on the unique hash code .. 720 seconds.
>
> (I'm importing within transactions with chunks of 25.000 records)
>
> The question is, is there any way to speed up the indexing proces?
>
> Couple of things on my mind:
>
> Doing this all on an in-memory database takes about 150 seconds in
> total. Is it possible to build a database in memory and then dump it
> to disk?
>
> Would it make sense to sort the records that i import? Could that
> result in a quicker index operation? This certainly helps for Berkeley
> DB. But it is lower level and you can use it's internal hashing
> functions on your data to pre-sort it. Maybe SQLite has something
> similar?
>

It does make sense to presort records before inserting into on-disk
db. You should insert into memory db first and then insert sorted
records into disk db from there. You can batch this into chunks of 25K
rows as you're doing now. But if you have enough memory, it will
probably be better to increase number of rows.
You should also play with increasing cache_size and page_size values.
Pseudo code:

OPEN :MEMORY:
ATTACH DISK_DB;

DO
  BEGIN;
  INSERT INTO MEMORY_TABLE VALUES(...); x 25.000
  INSERT INTO DISK_TABLE SELECT * FROM MEMORY_TABLE ORDER BY sha1_hash;
  COMMIT;
WHILE has_more_rows

Search this list's archives for "locality of reference" for more
helpful tips on the subject of building large indexes in SQLite.


> Are there any other tunable options that can improve indexing speed?
>
>  S.
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Nemanja Čorlija <[EMAIL PROTECTED]>
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to