Hi all,

I have been trying to implement a couple of things in SQLite because I  
only need it for myself (so no concurrency issues here).

I am working on Arch Linux (uname: 2.6.28-ARCH #1 SMP PREEMPT Sun Mar  
8 10:18:28 UTC 2009 i686 Intel(R) Xeon(TM) CPU 3.40GHz). I have a flat  
text file that I want to put into SQLite to be able to query it and  
also update fields it required. This raw text file has 7.2GB, roughly  
11 million entries and 60 fields. (Perhaps not the best design, but  
that shall not be the issue here)

Getting it into SQLite was not as easy as I anticipated. At first, I  
used the execute() method for every insert, and committed every 10000  
rows. This run didn't finish overnight.

The a commit every 100,000 rows: at around 2.5 million entries it  
slowed down so much that it was unacceptable. I used PRAGMA  
journal_mode = OFF which improved a bit, but not enough.

The final rescue was to use executemany() for every 100,000 records  
combined with a commit(). That put the whole file in the DB in approx  
17 min. Creation of indices where required took another 40 minutes.  
The final database file has roughly 11GB.

Now, I have 3 spare fields in the table that I want to populate with  
data from another file. I loop over the file in question, from there I  
get the ID of the record to update in the table and the info to  
populate the empty field with. Again, I used executemany() every  
100,000 records, assuming that it would work. But no.

Even when I use executemany() every 10,000 rows, these 10,000 updates  
take around 5 minutes or so (I haven't properly timed it). That is  
also with PRAGMA journal_mode = OFF.

When I just get the IDs of the entries that I need to update, look  
them up in the table (select a,b,c from table where id in (...)), and  
retrieve them with fetchall() and do nothing with them (no printing,  
etc.) then this works fine for the first 10,000 records. After that,  
again, it get's so slow that it's basically unusable. The ID field  
that I use for the lookups has an index on it. The fields that I am  
updating do not have indices on them. It's just text fields that are  
being populated with strings up to approx 150 characters.

Why are the retrievals that slow, and why are they getting slower the  
further I go? And why are the updates so slow?

Is it because the database file is too big?

Any help or suggestions would be highly appreciated!

Regards,

Flo


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to