On Fri, May 23, 2008 at 6:37 PM, Nemanja Čorlija <[EMAIL PROTECTED]> wrote: > On Fri, May 23, 2008 at 4:20 PM, Stefan Arentz <[EMAIL PROTECTED]> wrote: >> I have an interesting problem. I need to generate a large table >> periodically. The table contains a unique SHA1 hash code and 6 integer >> values and has about 6 million rows. Generating this table is fast. I >> can set it up in less than 90 seconds on a slow iMac (2.16 Ghz and >> slow disk). The thing that takes a (relatively) long time is the index >> creation on the unique hash code .. 720 seconds. >> >> (I'm importing within transactions with chunks of 25.000 records) >> >> The question is, is there any way to speed up the indexing proces? >> >> Couple of things on my mind: >> >> Doing this all on an in-memory database takes about 150 seconds in >> total. Is it possible to build a database in memory and then dump it >> to disk? >> >> Would it make sense to sort the records that i import? Could that >> result in a quicker index operation? This certainly helps for Berkeley >> DB. But it is lower level and you can use it's internal hashing >> functions on your data to pre-sort it. Maybe SQLite has something >> similar? >> > > It does make sense to presort records before inserting into on-disk > db. You should insert into memory db first and then insert sorted > records into disk db from there. You can batch this into chunks of 25K > rows as you're doing now. But if you have enough memory, it will > probably be better to increase number of rows. > You should also play with increasing cache_size and page_size values. > Pseudo code: > > OPEN :MEMORY: > ATTACH DISK_DB; > > DO > BEGIN; > INSERT INTO MEMORY_TABLE VALUES(...); x 25.000 > INSERT INTO DISK_TABLE SELECT * FROM MEMORY_TABLE ORDER BY sha1_hash; > COMMIT; > WHILE has_more_rows
This is a great tip. I implemented the above and it works very well. The load speed on Ubuntu Hardy went down from an hour+ to 6 minutes. I'm now experimenting with the pragmas to see if this can go even more faster. S. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users