We have an application that uses a single table database, that is quite
simple.  Here is the schema:

CREATE TABLE t (k varchar(50) primary key not null, d text not null, e
datetime)

We receive data over the course of an hour that is in a different form
and we need to put into the db.  The order of the data is random and it
is bursty.  The set of keys is stable in general.  Throughout the course
of an hour, all rows are eventually replaced.

Currently, we use 3.6.16.  We batch about 200 or so updates into a
transaction.  We use INSERT OR UPDATE to modify the data.  The db grows
to about 450mb; Page Size is 4096.  When we start with an empty db the
average insert time is around 0.3s/transaction.  When the db is full, it
rises to a high of 0.5s/transaction... for a while.  Then, magically,
after a few hours, that time jumps to 10s a transaction.  We converted
things to csvs and so forth to test through the sqlite command line
client, and get the same results.  All db access time becomes very slow
from any tool.

Using the analyzer, we see that when the db is working well, the index
fragmentation is around 5% or less... then all of a sudden, we get
95-100% fragmentation of the index.  When that happens, we get the
horrible insert times, irrespective, it seems, of table fragmentation.
The db can be full and have 10s of thousands of updates before this jump
occurs.    If I vacuum the db, of course everything works smooth, and it
seems to take a much longer time for the fragmentation to occur.  

Currently, I am testing a model where we UPDATE (then INSERT if rows
affected = 0) to see if this reduces the horrible index fragmentation we
see.  This method should cause less index stress since we don't delete
then insert, nor modify keys in general.


The question I have is:  Is this normal?  How do I prevent this massive
random fragmentation?  Vacuum is not a solution - the rate at which we'd
have to vacuum seems unreasonable and we can't just delete the dbs and
start over.  The data needs to be present and speedily accessible in
general; vacuum causes problems in this respect.  

I have seen another thread on this from a few months ago where someone
saw this behavior then it seemed to go away.  During the slowness of the
db, one thing we do notice is that zero cpu is being used and almost no
i/o is going on.  Disk TPS is also very very low.  When running the
sqlite3 command line client by itself against one of these highly
fragmented dbs shows this behavior.  For instance a pragma
integrity_check takes minutes to run.  No i/o or cpu at all for a minute
or two, then a big burst.  Same with inserting through sqlite3 client --
when fragmentation of the index is low, it inserts very fast.  When its
high, it inserts slow, and  we see trivial disk i/o / tps and no cpu.
And yes, as stated earlier, vacuum fixes it.. for a short while.

We see this behavior across multiple machines with different
motherboards, cpus, etc.  How can our pk index fragment so badly so
quickly?  Will the update method work better for this than insert or
replace?

Any help/thoughts on this would be appreciated.


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to