Re: [sqlite] index has higher cache priority than data?

Ryan Johnson Fri, 21 Sep 2012 07:24:20 -0700

On 21/09/2012 7:54 AM, Clemens Ladisch wrote:

John Bachir wrote:

i've read other posts on this list that say that we can't guess what sqlite
will do with cache.

It uses a simple LRU algorithm to determine which pages to kick out of
the page cache first (so at least it's somewhat deterministic).

however, could i be relatively confident that most of the time, it
will prioritize keeping the index in memory before it starts keeping
the data?

The page cache does not know what is in the pages.

Let's look at a simple example: assume the index has two pages, X and Y,
which each point to records in three data pages:
   X -> A,B,C; Y -> D,E,F

The order in which the pages would be used is this:
   X A X B X C Y D Y E Y F

For LRU, the last usage matters, so the LRU list will look like this:
   A B X C D E Y F

So the data pages _will_ crowd out the index pages (especially when
there are much fewer index then data pages ).

ideally it would always keep the entire index in memory and never
cache the data.

if i can't more or less depend on this, then sqlite probably won't
work for my application.

You could write your own page cache implementation that wraps the
original one but never throws out certain pages ...

This might help: http://www.sqlite.org/capi3ref.html#sqlite3_pcache

By implementing a custom page cache using this API, an application canbetter control the amount of memory consumed by SQLite, the way inwhich that memory is allocated and released, and the policies used todetermine exactly which parts of a database file are cached and forhow long.

AFAICT, a pluggable cache only needs to worry about memory, with all I/Ohandled by sqlite3. It shouldn't be too hard to cook up a minimalversion. I'm a bit doubtful on the "exactly which parts of a file"claim, since the API doesn't tell you anything about the pages it asksyou to cache. However, the "clock" algorithm would probably do you want,without needing to know which pages actually belong to an index: itprefers to evict pages that have been touched the fewest times (withdecay), rather than those that have gone the longest since their lasttouch.


Advantages:

- Popular pages are hard to evict, but become unpopular if leftuntouched too long- Simpler code (a for loop and a simple counter at each cache slot, vs.some sorted data structure for LRU)- Lower runtime overhead (amortized constant cost per unpin vs.logarithmic cost for LRU)

To implement Clock: arrange the cache slots in a circle, and keep aremembered position, the "hand." Whenever the system needs to allocate anew page, the "hand" sweeps around the circle looking for an unpinnedpage with zero touch count, and evicts the first such page; the touchcount increments whenever the page is unpinned, and decrements wheneverthe "hand" passes it. Any page unpinned at least once per clock cyclewill remain in cache, with memory pressure making clock cycles shorter.

Note that, while any given eviction can require looking at multiplepages, it usually averages out to only a few pages per eviction. Theworst case would be if an adversary touched N-1 popular pages once foreach unpopular page it fetches: the unpopular page would be evictedevery time, but clock would have to sweep the whole pool to discoverthis. Even then, though, you pay cost N to evict a page once every Nunpins.


Ryan

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] index has higher cache priority than data?

Reply via email to