On Thu, Oct 21, 2010 at 11:32 AM, Dustin Sallings <dus...@spy.net> wrote:

>
>        Mostly, I want to have an idea how fragmented I am.  My app can
> tremendously wreck a sqlite table right now to the point where a
> reconstruction (vacuum, etc...) is the difference between three hours and
> one minute.  I'd like to know when things are getting bad.
>
>        If no such thing exists, I can build it, but I would imagine it's
> been done before.
>
>
Dustin, thanks for your interesting post, always wondered how the internal
sqlite fragmentation can affect the performance overall. I did some tests
today and the numbers are interesting

Windows 7, 64 bit, sqlite 3.6.10, 3.7.2

The table

CREATE TABLE [TestTable] (
[Id] INTEGER PRIMARY KEY AUTOINCREMENT,
[Text] TEXT)

is filled with a repeated query about 1M times

INSERT INTO TestTable (id, text) VALUEs (abs(random() % (1 << 48)),
'12345678901234567...') {508 symbols total in the string to make it more
like 2G/5m records ratio Dustin posted}

The db now is about 1G in size.

This test will probably produce very fragmented main B-tree since every next
record has a "random" rowid while at the same time records as they added are
placed in consequent pages. The insert took about an hour (for obvious
reasons)

So, after that tests, querying SELECT * FROM TestTable LIMIT ... produces an
interesting results (I made sure every next open clears the windows system
cache with this trick:
http://www.mail-archive.com/sqlite-users@sqlite.org/msg54234.html)

SELECT * FROM TestTable LIMIT 10000
reads about 10M from the db, takes about 2 minutes to run

SELECT * FROM TestTable LIMIT 50000
reads about 50M from the db, takes about 10 minutes to run

SELECT * FROM TestTable LIMIT 100000
this was was too long to wait so probably SELECT * FROM TestTable would
indeed take hours.

Actually these are expected results. Windows system cache predictions fails
since sqlite really does almost completely random read at different file
offsets.

As for your initial question, I think fragmentation evaluation is possible
with the help of VFS. I'd keep a total sum of of absolute difference between
consequent read offsets for xRead operation. In this case if some xRead
request reads 1024 bytes at the 4096 offset and the next xRead reads
something at 5120 (4096 +1024) this will add 0 to the sum, but if the next
read something at 8192, this will add 3072 = (8192 - (4096 +1024)). If this
implemented, you will be able to see how this value changes for some of your
SELECT and maybe evaluate it on per record basis. If it's more like some
threshold value, ok, peform VACUUM

back to the tests, can someone perfom similar tests on linux? I heard it
have a different cache system, i wonder whether it can show better
performance in this artificial tests. Thanks in advance

Max Vlasov
maxerist.net
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to