I have a system requirement that feeds thousands of keyed records, each of
which contains a blob of around a meg or so into an sqlite database. Ok so
far. But the records come from multiple
processes, and I can't afford to do have commits after each insertion for
obvious performance reasons (if you have ever tried that you understand why)
. 
So suppose we batch the commits, and via an external
method not discussed here, and we can serialize the order of the commits
from the various processes so they don't over-lap. A "inserting" process
will block until it has "commit "rights". 
Are we still ok- i.e. leave the database in a sound condition?
 (commits can be batched a few hundred at a time maybe more) Contention for
"commit" rights is unfortunate, but at the end of the day, if I can get 20
or so records inserted per second by
this rube Goldberg, I'm ok with that, cause that is certainly the slow part,
not the ability of the multiple feeders to supply records to insert.

Here is a bad extra ingredient that may be a deal breaker. The processes are
on machines over a network, and have opened the same database on the same
target machine where the database
lives.. Network based windows file systems and locking are a recipe for
sqlite corruption is my understanding.( Assume  the serialization trick I
mentioned earlier was made to work even
if the processes invoking the trick were on the network.)
 So even if we were ok without this network issue, are we now into a
unworkable design?
The common table into which insertions will be done does not have indexes on
it of course. We do that later after all processes are finished.
In summary:
 Alternative I; multiple sqlite processes on different machines accessing
the same database, but commits serialized (by a method not covered here).
assume disaster without this serialized commit trick.

Alternative II:
I could design a system service that runs a single instance of sqlite, and
make the users on other machines add records via the service on the target
machine.
, like designing my own little client server subsystem.  I 
have done similar things before but its a lot of complexity and if I don't
need it, I don't want to do it.

Alternative III Maybe I should just use another sql product (mysql or
sqlserver ...) and odbc to do this- I don't like that complexity either.

Alternative IV:If there was a lightening fast sqlite bulk loader, I could
let each process make its own database first, and then dump(merge) them all
together as a final step.
 That's an alternative design but without a fast back door bulk loader, this
alternative does not seem too good. (all that extra IO) ugh.

Pure insertion speed: ( pair of 4.8ms disks in a raid0 disk array)
The final size of the database is almost out of the range of sqlite, being
between 100 and 120 Gbytes. I have tested sqlite on databases of this size
for insertion speed, and it was not as bad
as I expected from the caveats in the Owens  sqlite book. Average of 12
inserts per second over the load of the entire 120G file file.16/second for
100 G a 100 G database, and for reasonable sizes
like 1 gig, you can do 100 inserts/second. (all 1 transaction of course).
If I can get this kind of speed with multiple load sources located on
different machines, I would have what I need

Pure retrieval speed:  at around 120 Gbytes, 125,000 records with blobs
between about 1 meg each , I could get over 600 retrievals per second so
while insertions hurt, retrievals are fine.

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to