"Hickey, Larry" <[EMAIL PROTECTED]> wrote: > I have a system requirement that feeds thousands of keyed records, each of > which contains a blob of around a meg or so into an sqlite database. Ok so > far. But the records come from multiple > processes, and I can't afford to do have commits after each insertion for > obvious performance reasons (if you have ever tried that you understand why) > .. > So suppose we batch the commits, and via an external > method not discussed here, and we can serialize the order of the commits > from the various processes so they don't over-lap. A "inserting" process > will block until it has "commit "rights". > Are we still ok- i.e. leave the database in a sound condition? > (commits can be batched a few hundred at a time maybe more) Contention for > "commit" rights is unfortunate, but at the end of the day, if I can get 20 > or so records inserted per second by > this rube Goldberg, I'm ok with that, cause that is certainly the slow part, > not the ability of the multiple feeders to supply records to insert. > > Here is a bad extra ingredient that may be a deal breaker. The processes are > on machines over a network, and have opened the same database on the same > target machine where the database > lives.. Network based windows file systems and locking are a recipe for > sqlite corruption is my understanding.( Assume the serialization trick I > mentioned earlier was made to work even > if the processes invoking the trick were on the network.) > So even if we were ok without this network issue, are we now into a > unworkable design? > The common table into which insertions will be done does not have indexes on > it of course. We do that later after all processes are finished. > In summary: > Alternative I; multiple sqlite processes on different machines accessing > the same database, but commits serialized (by a method not covered here). > assume disaster without this serialized commit trick. > > Alternative II: > I could design a system service that runs a single instance of sqlite, and > make the users on other machines add records via the service on the target > machine. > , like designing my own little client server subsystem. I > have done similar things before but its a lot of complexity and if I don't > need it, I don't want to do it. > > Alternative III Maybe I should just use another sql product (mysql or > sqlserver ...) and odbc to do this- I don't like that complexity either. > > Alternative IV:If there was a lightening fast sqlite bulk loader, I could > let each process make its own database first, and then dump(merge) them all > together as a final step. > That's an alternative design but without a fast back door bulk loader, this > alternative does not seem too good. (all that extra IO) ugh. > > Pure insertion speed: ( pair of 4.8ms disks in a raid0 disk array) > The final size of the database is almost out of the range of sqlite, being > between 100 and 120 Gbytes. I have tested sqlite on databases of this size > for insertion speed, and it was not as bad > as I expected from the caveats in the Owens sqlite book. Average of 12 > inserts per second over the load of the entire 120G file file.16/second for > 100 G a 100 G database, and for reasonable sizes > like 1 gig, you can do 100 inserts/second. (all 1 transaction of course). > If I can get this kind of speed with multiple load sources located on > different machines, I would have what I need > > Pure retrieval speed: at around 120 Gbytes, 125,000 records with blobs > between about 1 meg each , I could get over 600 retrievals per second so > while insertions hurt, retrievals are fine. >
Well, you are certainly pushing the limits of what SQLite was designed to do. I'm glad it works for you so far. But I wouldn't count on it scaling up too much further than what you have done already. I do not know if you will get better performance or not from a client/server database like MySQL or PostgreSQL, but I would certainly hope so since those systems (unlike SQLite) are *designed* to handle terabyte sized databases over a network. So if they don't perform better than SQLite in those situations, that's kind of a black eye for the designers, huh? Alternative II seems the best approach to me, if you really want to stick with SQLite, though Alternative IV also shows promise. In alternative IV, let each machine build its own database on a local disk drive, then copy the local databases to the machine with the master database. Merge the local databases one by one by first opening a connection to the master database and then running the following commands: ATTACH 'local1.db' AS local; INSERT INTO bigtable SELECT * FROM local.bigtable; DETACH local; INSERT INTO ... SELECT is about as efficient as it gets for moving large quantities of data around in SQLite. BTW, you'll likely get much better performance out of SQLite with 1MB BLOBs if you use a really big page size. Use 8192 at least and consider using 32768 (the maximum page size). Certainly the 100GB practical size limit for SQLite is relaxed somewhat with larger size pages. The practical size is related to the number of pages in the database file, so the practical upper bound on the size of an SQLite database doubles every time you double the page size. The "100GB" practical limit assumes a 1K page, so if you go with a 32K page, SQLite will likely work fine into the terabyte range. But this is just a guess - please experiment to see what works best for you. Another thing to consider is storing the BLOBs in separate files and then storing just the filename for the BLOB in the SQLite database. Filesystems vary, but many filesystem implementations work better with thousands of 1MB files than with one big 1TB file. Many of the client/server database engines split large tables across multiple files - presumably for this reason. You are taking SQLite into areas where it has not been before, to my knowledge. I am very eager to hear about the results you achieve. Please keep me informed. -- D. Richard Hipp <[EMAIL PROTECTED]> ----------------------------------------------------------------------------- To unsubscribe, send email to [EMAIL PROTECTED] -----------------------------------------------------------------------------