I don't think ULL buys you as much extra cache space as you think, and it certainly isn't anywhere near as fast as LL. Consider the following problems that still remain: 1. If you need to point to an element (common with linked lists) it gets really messy. Sorted inserts and deletes will change the memory location of elements, as well as their offset in the node, so you can't point to them by memory location or node + offset. Instead, you have to point to the list, plus something static and uniquely identifiable about the element. Many times this is going to involve additional space once per node. Assuming you care about word alignment (which you should if you're talking memory locality and stuff), the additional space needs to be the size of a pointer. This would give a net savings of one pointer for double LL, and ZERO net savings for single LL, unless each element already stored something uniquely identifying. 2. Unless you never need to add OR remove elements in the middle, you need wasted space to avoid very expensive operations. You still have to store the data. If the data is a pointer to other data, the whole business of cache locality is largely academic anyway, since you've got plenty of chances for cache misses when you look at the other data. Let's say that the data stored is 16 bytes (small, but useful), and pointers are 4 bytes. The data in this case uses 66% of the space, so unless you can keep the wasted space per node below 33% you don't gain anything at all over a straight double LL with good locality. If you're replacing a single LL, you have to keep wasted space below 20% to see any gain. At larger and larger data, this number gets smaller and smaller. 3. If data is arbitrarily removed at approximately the same rate that data is added, the wasted space will naturally gravitate towards 50%, which is a no gain for footprint in most cases. Even for a single LL with pointer sized data this is a zero gain. 4. You CAN improve wasted space with balancing to consolidate adjacent nodes when possible, but this can quickly become expensive with lots of memory copy operations just to avoid a little wasted space in the cache. If we measure the value of that space in terms of the percentage of a cache miss that it might avoid times the cost of a cache miss in time, it is unclear if there is a real gain here. 5. Insert and delete operations will require, on average, a copy of half the elements in a node. Compared to the 2 operation insert/delete of a single LL and this is not cheap. Again, compare to the value of the cache you are trying to save by avoiding the pointers, and this doesn't look like a wise spend to me. 6. Linked lists are designed for use in cases where inserts and removals should be fast and simple, but ULL ignores this. Deletes are especially problematic, requiring a full scan to the element being deleted, followed by a memory copy and possibly node balancing. In a double linked list, if you have a pointer to the node you want to delete, the delete costs only 2 operations. 7. ULL will outperform LL when traversing nodes to find a sorted insertion point, but may likely lose that gain when it actually does the insert. In any case btree will outperform either when finding a sorted insertion point, still gets the superfast insert, and can get appropriate locality by pooling.
I guess my point is, avoiding cache misses is a fine goal, but I don't believe ULL is the tool to use. A Pool allocation scheme that uses the same pool size as a ULL node will likely have the same or better memory footprint, faster insertion, faster deletion, and the same cache locality. In practice, pools could safely be much larger, because you don't have to balance the additional cost for inserts and deletes. This means pool allocation could experience even better cache locality than ULL, due to larger allocation blocks. Additionally, the allocation cost per node in these pools is unbelievably low (a couple of operations), and if you have a separate pool per list, access to the pools can be controlled by the same critical sections that already control the linked lists, making it automatically threadsafe. Pool allocation works without all the added complexity of ULL, and rather than hidden costs (memcpy for insert and delete), there are hidden saving (nearly zero per-node allocation costs unless you hit the end of a pool). Anyway, if someone wants to tackle the problem of improving cache locality, and I get a vote on how it's done, I vote pools. They're easy (could probably be done in < 100 lines of new code), uncomplicated (logically the LL works the same as it always has), flexible (can convert to btree), and give a clear benefit (as opposed to questionable gains once space, pointers to nodes, and memcpy are looked at). John -----Original Message----- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Nicolas Williams Sent: Tuesday, October 27, 2009 6:29 PM To: General Discussion of SQLite Database Subject: Re: [sqlite] Idea for improving page cache On Tue, Oct 27, 2009 at 04:28:11PM -0400, John Crenshaw wrote: > "advantage" kind of depends. ULL is more specialized. You gain some > benefit, but also lose some as well. For example, consider what is > involved in doing a sorted insert into an ULL. On the other hand, you > can get all of the same locality benefit with a pool allocation > scheme. You don't reduce the pointer overhead this way, but that isn't > really much of an issue. I don't know that you get the same locality benefit with pools: with a ULL there's fewer prev/next pointers to take up valuable cache space. The need to "pullup"/"breakup" a ULL at times is annoying, and to minimize you probably have to waste some space in order to amortize the cost. As you say, there's a trade-off. Many optimizations result in more complex, error-prone, brittle software. Wasting space is not much of a problem for cache locality, and if you keep the wasted space well under 50% you're ahead of plain lists in terms of memory footprint. So, ignoring code complexity, ULL seems like a likely win-win performance-wise. Even code complexity-wise, ULLs allow random access to be much faster than with plain linked lists, which means you're more likely to use random access, which means you probably win code complexity-wise too. Of course, once you start leaving room for additions in each chunk, ULLs start resembling B-trees, which, I think, is your point: might as well go whole-hog. > ULL requires less time to walk all nodes, but since you can't afford > to keep ULL sorted, there aren't a lot of scenarios where that saves I don't think it's true that you can't afford to keep it sorted, but you probably have to waste space if you want it sorted and allowing most random inserts to not require spills. Also, just because a chunk is full, or even all chunks are full, does not mean you must move lots of memory around: you can just insert a new chunk and move some entries from the neighboring ones into it, thus creating some breathing space for further additions (at the cost of space waste). Nico -- _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users