On 21/09/2012 7:54 AM, Clemens Ladisch wrote:
John Bachir wrote:
i've read other posts on this list that say that we can't guess what sqlite
will do with cache.
It uses a simple LRU algorithm to determine which pages to kick out of
the page cache first (so at least it's somewhat deterministic).

however, could i be relatively confident that most of the time, it
will prioritize keeping the index in memory before it starts keeping
the data?
The page cache does not know what is in the pages.

Let's look at a simple example: assume the index has two pages, X and Y,
which each point to records in three data pages:
   X -> A,B,C; Y -> D,E,F

The order in which the pages would be used is this:
   X A X B X C Y D Y E Y F

For LRU, the last usage matters, so the LRU list will look like this:
   A B X C D E Y F

So the data pages _will_ crowd out the index pages (especially when
there are much fewer index then data pages ).

ideally it would always keep the entire index in memory and never
cache the data.

if i can't more or less depend on this, then sqlite probably won't
work for my application.
You could write your own page cache implementation that wraps the
original one but never throws out certain pages ...
This might help: http://www.sqlite.org/capi3ref.html#sqlite3_pcache
By implementing a custom page cache using this API, an application can better control the amount of memory consumed by SQLite, the way in which that memory is allocated and released, and the policies used to determine exactly which parts of a database file are cached and for how long.

AFAICT, a pluggable cache only needs to worry about memory, with all I/O handled by sqlite3. It shouldn't be too hard to cook up a minimal version. I'm a bit doubtful on the "exactly which parts of a file" claim, since the API doesn't tell you anything about the pages it asks you to cache. However, the "clock" algorithm would probably do you want, without needing to know which pages actually belong to an index: it prefers to evict pages that have been touched the fewest times (with decay), rather than those that have gone the longest since their last touch.

Advantages:
- Popular pages are hard to evict, but become unpopular if left untouched too long - Simpler code (a for loop and a simple counter at each cache slot, vs. some sorted data structure for LRU) - Lower runtime overhead (amortized constant cost per unpin vs. logarithmic cost for LRU)

To implement Clock: arrange the cache slots in a circle, and keep a remembered position, the "hand." Whenever the system needs to allocate a new page, the "hand" sweeps around the circle looking for an unpinned page with zero touch count, and evicts the first such page; the touch count increments whenever the page is unpinned, and decrements whenever the "hand" passes it. Any page unpinned at least once per clock cycle will remain in cache, with memory pressure making clock cycles shorter.

Note that, while any given eviction can require looking at multiple pages, it usually averages out to only a few pages per eviction. The worst case would be if an adversary touched N-1 popular pages once for each unpopular page it fetches: the unpopular page would be evicted every time, but clock would have to sweep the whole pool to discover this. Even then, though, you pay cost N to evict a page once every N unpins.

Ryan

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to