On Wed, 08 Apr 2009 23:17:02 +0200, Florian Nigsch
<f...@nigsch.eu> wrote:

>Hi all,
>
>I have been trying to implement a couple of things in SQLite because I  
>only need it for myself (so no concurrency issues here).
>
>I am working on Arch Linux (uname: 2.6.28-ARCH #1 SMP PREEMPT Sun Mar  
>8 10:18:28 UTC 2009 i686 Intel(R) Xeon(TM) CPU 3.40GHz). I have a flat  
>text file that I want to put into SQLite to be able to query it and  
>also update fields it required. This raw text file has 7.2GB, roughly  
>11 million entries and 60 fields. (Perhaps not the best design, but  
>that shall not be the issue here)

We can't ignore that. If you have 11 million entries and 60
fields, I bet there is a lot of redundant data in the
database, which makes it big.
Normalization can make a lot of difference here.

>Getting it into SQLite was not as easy as I anticipated. At first, I  
>used the execute() method for every insert, and committed every 10000  
>rows. This run didn't finish overnight.
>
>The a commit every 100,000 rows: at around 2.5 million entries it  
>slowed down so much that it was unacceptable. I used PRAGMA  
>journal_mode = OFF which improved a bit, but not enough.
>
>The final rescue was to use executemany() for every 100,000 records  
>combined with a commit(). That put the whole file in the DB in approx  
>17 min. Creation of indices where required took another 40 minutes.  
>The final database file has roughly 11GB.
>
>Now, I have 3 spare fields in the table that I want to populate with  
>data from another file. I loop over the file in question, from there I  
>get the ID of the record to update in the table and the info to  
>populate the empty field with. Again, I used executemany() every  
>100,000 records, assuming that it would work. But no.
>
>Even when I use executemany() every 10,000 rows, these 10,000 updates  
>take around 5 minutes or so (I haven't properly timed it). That is  
>also with PRAGMA journal_mode = OFF.

I don't recognize executemany() or commit(). They are not
part of the sqlite3_* API. Can I assume they are your own
wrapper functions? 

You did tell you commit() your transactions, but do you also
BEGIN them? (I apologize for asking the obvious, but someone
has to ask it anyway).

>When I just get the IDs of the entries that I need to update, look  
>them up in the table (select a,b,c from table where id in (...)), and  
>retrieve them with fetchall() and do nothing with them (no printing,  
>etc.) then this works fine for the first 10,000 records. After that,  
>again, it get's so slow that it's basically unusable. The ID field  
>that I use for the lookups has an index on it. The fields that I am  
>updating do not have indices on them. It's just text fields that are  
>being populated with strings up to approx 150 characters.
>
>Why are the retrievals that slow, and why are they getting slower the  
>further I go? And why are the updates so slow?
>
>Is it because the database file is too big?
> 
>Any help or suggestions would be highly appreciated!
>
>Regards,
>
>Flo

-- 
  (  Kees Nuyt
  )
c[_]
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to