> Is it ok for cache to behave like this or some optimization is possible to > fix this?
For this particular case I believe you can do some optimization by making your own implementation of cache. Also I believe such "strange" behavior of cache is pretty much explainable. Remember that standard implementation of cache replaces pages on LRU basis, i.e. if cache is too large then new page replaces the oldest page, the one which access time is smallest. So let's say for execution of this query SELECT Count(*) FROM WikiData WHERE Id < 9812 it needs to read pages 1 through 2000 of the database in exactly this order. For execution of this query SELECT Count(*) FROM WikiData WHERE Id < 10000 it needs to read pages 1 through 2001 in exactly this order. But when it wants to load 2001st page it sees that cache is too large, so it replaces 1st page with 2001st. Then when you want to execute this query once more it need to start reading from 1st page. It's not in cache and cache is too large so it replaces the oldest page - frankly it's the 2nd one. After 1st page it needs to read 2nd which is not in cache and it replaces the oldest page - 3rd one, and so on. As you see simple LRU algorithm in this particular case forces SQLite to read all pages again although they were in memory a tiny moment ago. And the problem can be solved only by temporary (e.g. for the time of statement execution) allowing the cache to grow above set limit. Pavel On Wed, Sep 22, 2010 at 10:36 AM, Max Vlasov <max.vla...@gmail.com> wrote: > Hi, > > playing with my admin about cache size (there was a question today related > to the cache size), noticed a strange thing with cache. It's Windows, > initially it was v3.6.10, but the same is for 3.7.2 > > I sometimes mentioned that I can track vfs requests for every select in my > admin. In this case it was interesting to look at the numbers related to the > cache size. > > The default PRAGA page_size=2000; so the threshold level is about 2000*1024 > = 2,048,000 bytes. > A select for a test table filled with wikipedia abstract article was > > SELECT Count(*) FROM WikiData WHERE Id < ? > > This query doesn't produce tiny result in the front-end, but at the same > time uses cache and tried to keep the pages accessed in it. > > And the number of ? that corresponds to the cache_size threshold is about > 9813. > > The things I noticed: > - If the select accesses the pages total size of whose is lower than the > cache size, everything is ok. I mean, trying to select any subset of the > data leads to only 16 bytes reading. > SELECT Count(*) FROM WikiData WHERE Id < 9812 => VFS 2,045,968 > SELECT Count(*) FROM WikiData WHERE Id < 1000 => VFS 16 > SELECT Count(*) FROM WikiData WHERE Id > 5000 and id < 7000 => VFS 16 > these are expected results > > - But when there was a request that filled all the cache and the following > request is a subset of the first one, the latter unfortunately doesn't take > advantage of the pages already in cache > SELECT Count(*) FROM WikiData WHERE Id < 10000 => VFS 2,082,838 > SELECT Count(*) FROM WikiData WHERE Id < 9800 => VFS 2,023,440 > > - Also when there are two consecutive requests that probably larger than the > cache size, they will do the full reading (or almost full reading) > SELECT Count(*) FROM WikiData WHERE Id < 10000 => VFS 2,082,838 > SELECT Count(*) FROM WikiData WHERE Id < 10001 => VFS 2,062,352 > SELECT Count(*) FROM WikiData WHERE Id < 9999 => VFS 2,061,328 > > Is it ok for cache to behave like this or some optimization is possible to > fix this? > > I don't know if it's possible to do similar tests with other front-ends. If > someone really interested, I can upload my tool, but it's just not > completely ready for mainstream. > > Thanks, > > Max > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users