On Thu, Oct 21, 2010 at 11:32 AM, Dustin Sallings <dus...@spy.net> wrote:
> > Mostly, I want to have an idea how fragmented I am. My app can > tremendously wreck a sqlite table right now to the point where a > reconstruction (vacuum, etc...) is the difference between three hours and > one minute. I'd like to know when things are getting bad. > > If no such thing exists, I can build it, but I would imagine it's > been done before. > > Dustin, thanks for your interesting post, always wondered how the internal sqlite fragmentation can affect the performance overall. I did some tests today and the numbers are interesting Windows 7, 64 bit, sqlite 3.6.10, 3.7.2 The table CREATE TABLE [TestTable] ( [Id] INTEGER PRIMARY KEY AUTOINCREMENT, [Text] TEXT) is filled with a repeated query about 1M times INSERT INTO TestTable (id, text) VALUEs (abs(random() % (1 << 48)), '12345678901234567...') {508 symbols total in the string to make it more like 2G/5m records ratio Dustin posted} The db now is about 1G in size. This test will probably produce very fragmented main B-tree since every next record has a "random" rowid while at the same time records as they added are placed in consequent pages. The insert took about an hour (for obvious reasons) So, after that tests, querying SELECT * FROM TestTable LIMIT ... produces an interesting results (I made sure every next open clears the windows system cache with this trick: http://www.mail-archive.com/sqlite-users@sqlite.org/msg54234.html) SELECT * FROM TestTable LIMIT 10000 reads about 10M from the db, takes about 2 minutes to run SELECT * FROM TestTable LIMIT 50000 reads about 50M from the db, takes about 10 minutes to run SELECT * FROM TestTable LIMIT 100000 this was was too long to wait so probably SELECT * FROM TestTable would indeed take hours. Actually these are expected results. Windows system cache predictions fails since sqlite really does almost completely random read at different file offsets. As for your initial question, I think fragmentation evaluation is possible with the help of VFS. I'd keep a total sum of of absolute difference between consequent read offsets for xRead operation. In this case if some xRead request reads 1024 bytes at the 4096 offset and the next xRead reads something at 5120 (4096 +1024) this will add 0 to the sum, but if the next read something at 8192, this will add 3072 = (8192 - (4096 +1024)). If this implemented, you will be able to see how this value changes for some of your SELECT and maybe evaluate it on per record basis. If it's more like some threshold value, ok, peform VACUUM back to the tests, can someone perfom similar tests on linux? I heard it have a different cache system, i wonder whether it can show better performance in this artificial tests. Thanks in advance Max Vlasov maxerist.net _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users