Re: [sqlite] doing everything sqlite warns against

drh Sat, 09 Dec 2006 05:10:46 -0800

"Hickey, Larry" <[EMAIL PROTECTED]> wrote:
> I have a system requirement that feeds thousands of keyed records, each of
> which contains a blob of around a meg or so into an sqlite database. Ok so
> far. But the records come from multiple
> processes, and I can't afford to do have commits after each insertion for
> obvious performance reasons (if you have ever tried that you understand why)
> .. 
> So suppose we batch the commits, and via an external
> method not discussed here, and we can serialize the order of the commits
> from the various processes so they don't over-lap. A "inserting" process
> will block until it has "commit "rights". 
> Are we still ok- i.e. leave the database in a sound condition?
>  (commits can be batched a few hundred at a time maybe more) Contention for
> "commit" rights is unfortunate, but at the end of the day, if I can get 20
> or so records inserted per second by
> this rube Goldberg, I'm ok with that, cause that is certainly the slow part,
> not the ability of the multiple feeders to supply records to insert.
> 
> Here is a bad extra ingredient that may be a deal breaker. The processes are
> on machines over a network, and have opened the same database on the same
> target machine where the database
> lives.. Network based windows file systems and locking are a recipe for
> sqlite corruption is my understanding.( Assume  the serialization trick I
> mentioned earlier was made to work even
> if the processes invoking the trick were on the network.)
>  So even if we were ok without this network issue, are we now into a
> unworkable design?
> The common table into which insertions will be done does not have indexes on
> it of course. We do that later after all processes are finished.
> In summary:
>  Alternative I; multiple sqlite processes on different machines accessing
> the same database, but commits serialized (by a method not covered here).
> assume disaster without this serialized commit trick.
> 
> Alternative II:
> I could design a system service that runs a single instance of sqlite, and
> make the users on other machines add records via the service on the target
> machine.
> , like designing my own little client server subsystem.  I 
> have done similar things before but its a lot of complexity and if I don't
> need it, I don't want to do it.
> 
> Alternative III Maybe I should just use another sql product (mysql or
> sqlserver ...) and odbc to do this- I don't like that complexity either.
> 
> Alternative IV:If there was a lightening fast sqlite bulk loader, I could
> let each process make its own database first, and then dump(merge) them all
> together as a final step.
>  That's an alternative design but without a fast back door bulk loader, this
> alternative does not seem too good. (all that extra IO) ugh.
> 
> Pure insertion speed: ( pair of 4.8ms disks in a raid0 disk array)
> The final size of the database is almost out of the range of sqlite, being
> between 100 and 120 Gbytes. I have tested sqlite on databases of this size
> for insertion speed, and it was not as bad
> as I expected from the caveats in the Owens  sqlite book. Average of 12
> inserts per second over the load of the entire 120G file file.16/second for
> 100 G a 100 G database, and for reasonable sizes
> like 1 gig, you can do 100 inserts/second. (all 1 transaction of course).
> If I can get this kind of speed with multiple load sources located on
> different machines, I would have what I need
> 
> Pure retrieval speed:  at around 120 Gbytes, 125,000 records with blobs
> between about 1 meg each , I could get over 600 retrievals per second so
> while insertions hurt, retrievals are fine.
>


Well, you are certainly pushing the limits of what SQLite was
designed to do.  I'm glad it works for you so far.  But I wouldn't
count on it scaling up too much further than what you have done already.

I do not know if you will get better performance or not from a
client/server database like MySQL or PostgreSQL, but I would certainly
hope so since those systems (unlike SQLite) are *designed* to handle
terabyte sized databases over a network.  So if they don't perform
better than SQLite in those situations, that's kind of a black eye for
the designers, huh?

Alternative II seems the best approach to me, if you really want to
stick with SQLite, though Alternative IV also shows promise. In
alternative IV, let each machine build its own database on a local
disk drive, then copy the local databases to the machine with the
master database.  Merge the local databases one by one by first
opening a connection to the master database and then running the
following commands:

    ATTACH 'local1.db' AS local;
    INSERT INTO bigtable SELECT * FROM local.bigtable;
    DETACH local;

INSERT INTO ... SELECT is about as efficient as it gets for moving
large quantities of data around in SQLite.

BTW, you'll likely get much better performance out of SQLite with
1MB BLOBs if you use a really big page size.  Use 8192 at least and
consider using 32768 (the maximum page size).  Certainly the 100GB
practical size limit for SQLite is relaxed somewhat with larger size
pages.  The practical size is related to the number of pages in the
database file, so the practical upper bound on the size of an SQLite
database doubles every time you double the page size.  The "100GB"
practical limit assumes a 1K page, so if you go with a 32K page, 
SQLite will likely work fine into the terabyte range.

But this is just a guess - please experiment to see what works
best for you.

Another thing to consider is storing the BLOBs in separate files and
then storing just the filename for the BLOB in the SQLite database.
Filesystems vary, but many filesystem implementations work better with
thousands of 1MB files than with one big 1TB file.  Many of the
client/server database engines split large tables across multiple
files - presumably for this reason.

You are taking SQLite into areas where it has not been before,
to my knowledge.  I am very eager to hear about the results you
achieve.  Please keep me informed.

--
D. Richard Hipp  <[EMAIL PROTECTED]>


-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Re: [sqlite] doing everything sqlite warns against

Reply via email to