I don't think ULL buys you as much extra cache space as you think, and
it certainly isn't anywhere near as fast as LL. Consider the following
problems that still remain:
1. If you need to point to an element (common with linked lists) it gets
really messy. Sorted inserts and deletes will change the memory location
of elements, as well as their offset in the node, so you can't point to
them by memory location or node + offset. Instead, you have to point to
the list, plus something static and uniquely identifiable about the
element. Many times this is going to involve additional space once per
node. Assuming you care about word alignment (which you should if you're
talking memory locality and stuff), the additional space needs to be the
size of a pointer. This would give a net savings of one pointer for
double LL, and ZERO net savings for single LL, unless each element
already stored something uniquely identifying.
2. Unless you never need to add OR remove elements in the middle, you
need wasted space to avoid very expensive operations. You still have to
store the data. If the data is a pointer to other data, the whole
business of cache locality is largely academic anyway, since you've got
plenty of chances for cache misses when you look at the other data.
Let's say that the data stored is 16 bytes (small, but useful), and
pointers are 4 bytes. The data in this case uses 66% of the space, so
unless you can keep the wasted space per node below 33% you don't gain
anything at all over a straight double LL with good locality. If you're
replacing a single LL, you have to keep wasted space below 20% to see
any gain. At larger and larger data, this number gets smaller and
smaller.
3. If data is arbitrarily removed at approximately the same rate that
data is added, the wasted space will naturally gravitate towards 50%,
which is a no gain for footprint in most cases. Even for a single LL
with pointer sized data this is a zero gain.
4. You CAN improve wasted space with balancing to consolidate adjacent
nodes when possible, but this can quickly become expensive with lots of
memory copy operations just to avoid a little wasted space in the cache.
If we measure the value of that space in terms of the percentage of a
cache miss that it might avoid times the cost of a cache miss in time,
it is unclear if there is a real gain here.
5. Insert and delete operations will require, on average, a copy of half
the elements in a node. Compared to the 2 operation insert/delete of a
single LL and this is not cheap. Again, compare to the value of the
cache you are trying to save by avoiding the pointers, and this doesn't
look like a wise spend to me.
6. Linked lists are designed for use in cases where inserts and removals
should be fast and simple, but ULL ignores this. Deletes are especially
problematic, requiring a full scan to the element being deleted,
followed by a memory copy and possibly node balancing. In a double
linked list, if you have a pointer to the node you want to delete, the
delete costs only 2 operations.
7. ULL will outperform LL when traversing nodes to find a sorted
insertion point, but may likely lose that gain when it actually does the
insert. In any case btree will outperform either when finding a sorted
insertion point, still gets the superfast insert, and can get
appropriate locality by pooling.

I guess my point is, avoiding cache misses is a fine goal, but I don't
believe ULL is the tool to use. A Pool allocation scheme that uses the
same pool size as a ULL node will likely have the same or better memory
footprint, faster insertion, faster deletion, and the same cache
locality. In practice, pools could safely be much larger, because you
don't have to balance the additional cost for inserts and deletes. This
means pool allocation could experience even better cache locality than
ULL, due to larger allocation blocks. Additionally, the allocation cost
per node in these pools is unbelievably low (a couple of operations),
and if you have a separate pool per list, access to the pools can be
controlled by the same critical sections that already control the linked
lists, making it automatically threadsafe.

Pool allocation works without all the added complexity of ULL, and
rather than hidden costs (memcpy for insert and delete), there are
hidden saving (nearly zero per-node allocation costs unless you hit the
end of a pool).

Anyway, if someone wants to tackle the problem of improving cache
locality, and I get a vote on how it's done, I vote pools. They're easy
(could probably be done in < 100 lines of new code), uncomplicated
(logically the LL works the same as it always has), flexible (can
convert to btree), and give a clear benefit (as opposed to questionable
gains once space, pointers to nodes, and memcpy are looked at).

John

-----Original Message-----
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Nicolas Williams
Sent: Tuesday, October 27, 2009 6:29 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Idea for improving page cache

On Tue, Oct 27, 2009 at 04:28:11PM -0400, John Crenshaw wrote:
> "advantage" kind of depends. ULL is more specialized. You gain some
> benefit, but also lose some as well. For example, consider what is
> involved in doing a sorted insert into an ULL. On the other hand, you
> can get all of the same locality benefit with a pool allocation
> scheme. You don't reduce the pointer overhead this way, but that isn't
> really much of an issue.

I don't know that you get the same locality benefit with pools: with a
ULL there's fewer prev/next pointers to take up valuable cache space.

The need to "pullup"/"breakup" a ULL at times is annoying, and to
minimize you probably have to waste some space in order to amortize the
cost.  As you say, there's a trade-off.  Many optimizations result in
more complex, error-prone, brittle software.

Wasting space is not much of a problem for cache locality, and if you
keep the wasted space well under 50% you're ahead of plain lists in
terms of memory footprint.  So, ignoring code complexity, ULL seems like
a likely win-win performance-wise.  Even code complexity-wise, ULLs
allow random access to be much faster than with plain linked lists,
which means you're more likely to use random access, which means you
probably win code complexity-wise too.  Of course, once you start
leaving room for additions in each chunk, ULLs start resembling B-trees,
which, I think, is your point: might as well go whole-hog.

> ULL requires less time to walk all nodes, but since you can't afford
> to keep ULL sorted, there aren't a lot of scenarios where that saves

I don't think it's true that you can't afford to keep it sorted, but you
probably have to waste space if you want it sorted and allowing most
random inserts to not require spills.  Also, just because a chunk is
full, or even all chunks are full, does not mean you must move lots of
memory around: you can just insert a new chunk and move some entries
from the neighboring ones into it, thus creating some breathing space
for further additions (at the cost of space waste).

Nico
-- 
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to