Sorry it took me some time to get back to this thread. ----- Original Message ---- From: Christian Smith <[EMAIL PROTECTED]>
> When your database does not fit in memory, yes, you're right, the OS may > well get caching wrong, and in the worst way possible. Two things though: > - SQLite does have known scaling issues when using multi-gigabyte > databases. > - Memory is cheap. If it don't fit, spend that few hundred dollars a few > days of your time is worth and buy another few gigs or RAM. The machine has 2 GB of RAM and the table that seems to cause the problems is less than 700 megs (proven by the fact that if I precache this database, things get zippy). To me it seems like the problem is related to the way the reads are done, but I can be wrong: to me it seems that caches never really get "hot" (and with nothing else running on the machine, the OS is pretty much caching all reads done by sqlite). > > >Right now, sqlite shows performance that is on par with a simple > >filesystem structure (except it's much faster to backup because > >traversing a multimilion file structure takes several hours). I was > >expecting a few things by moving to sqlite: > >* getting a better ramp up (cold cache) performance than a dump > > filesystem structure. > Nothing will speed up the physical disks. I agree that disk i/o is the bottleneck, but what can be worked around is the latencies needed to seek in the file and the way operations are send out to the disk to help the OS cache more useful information and/or have a better caching mechanism. > >* having a [much] higher throughput (so that I can combine multiple > > servers into one), as the records are small and there is no > > fragmentation of the folders here. > >* having consistant latencies (filesystems tend to vary a lot). > > > >> - Have you tested other databases? What sort of performance did you get > > from those? > >I am in the process of setting up mysql with our app, I will keep you > >posted on the result. > > Prepare to be disappointed, IMO. The most important thing when dealing > with big fast databases is disk IO, and anything (MySQL, PGSQL, Oracle, > SQLite) will find disks a bottleneck. The reason it took me a while to reply to this thread was that I setup a MySQL server (5.0.19, run on the same machine) and adapted my application to run with it. The results on the exact same operations that I did before: * importing of the data is faster with MySQL, and performance doesn't degrade exponentially with the number or rows (insertions at the beginning and at the end of the import operation are of similar speed). sqlite and MySQL started at around the same speed, but after a few million inserts, sqlite becomes slow. * cached cleared, on my typical run test (read&write combination), MySQL ran 2 minutes faster than sqlite (6 minutes vs 8 minutes), getting nice latencies after about 3 minutes (and that's where the difference is, I think). I think that after 3 minutes, MySQL manage to have most critical data cached in RAM. * with precached dbs (file sizes are very similar for sqlite and MySQL, for the difference that MySQL separates the data and index into 2 files), MySQL is faster too (1m30 vs 2 minutes). * latencies stay low (as in, rarely blocks for > 4 seconds, sqlite would block for > 10 seconds especially for the early queries when there is no cache). Maybe the reason MySQL performs better is simply because they use 2 files instead of 1. It does make sense: if the index/offsets are all together on disk, then the OS can very efficiently cache this information for us. Was splitting the file into 2 ever tried for sqlite? I mean, there is already the -journal file, so why not having a -index file as well? > > To accomodate a lot of users, I have one main DB that holds all users -> > > unique ID The other dbs are a partitioned db really, so that later > > threads conflict only when accessing users within the same range and the > > dbs don't get out of proportion (because blobs are used I thought a lot > > of records could become a performance bottleneck). If I precache the big > > table name -> uid, the import of 650,000 records takes about 2 minutes. > >If I don't precache it, it takes about 8 minutes. I start/stop > >transactions every X seconds (right now, I have it set at 10 seconds). > > Are these name->uid lookups used in a read-only fashion after import? If > so, then multiple threads can indeed read the table in parallel, and you > should have good performance. A lot of the time this table is accessed read only, yes. I tried to have several threads access the table at the same time before, and the performance was a bit lower (maybe because transactions occur at the same time, and block the readers?), but also, and this was why I rolled back this change, the busy handler that does a simple "usleep", causes starvation and thus potential high latencies. > > For now, I'd say: > - If the import/update case is predominant, look at another DB. I'd > suggest PostgreSQL, as it's MVCC lockless row protocol allows concurrent > updates to disjoint sets of rows in the same table. > - If read-only access is the predominant use, SQLite may well fit the > bill, and take the performance hit when importing data. What has been confusing me since the beginning is that sqlite performs well for databases that are rarelly updated. I think the performance page should underline this fact. Also, I think the performance page should be updated to reflect the fact that sqlite might not be the solution for bigger databases (and maybe add a section of tests that last more than 5 seconds :)). I'll see if I can take the test scripts from the performance page and change the parameters to try to run a 10 minutes test or so (with clearing of caches, probably reformatting of the partition used for the data, between tests, so that all engines are tested equally). I am a bit disapointed that I can't use sqlite at this point: I really liked the fact that there is no setup for the enduser, and thought that because I was not doing complex sql queries, I could get decent performance (like I said earlier, low latency is really what I need here). Of course, if anyone comes with an idea, I am more than willing to try it and help improve this great library. Nicolas