<[EMAIL PROTECTED]> wrote:

> The problem (I suspect) is that you have an index on Table3.
> As you insert to records into Table3, those record go at the
> end, which is very efficient.  But the index entries have to
> be inserted in index order, which means they get scattered
> out all through the index.  This works fine as long as the
> index is small enough to fit in cache (either SQLite's private
> cache, or failing that your operating systems disk cache.)
> Once the index gets too big to fit in cache, you tend to
> start thrashing.
>
> The problem is a lack of locality of reference in the index.
> Each index insertion requires O(logN) disk reads and writes.
> This is no big deal as long as a "disk read and write" is
> serviced from cache but gets to be a very big deal when it
> actually involves real disk I/O.  You start to get real disk
> I/O when the index loses locality of reference and exceeds
> the size of your cache.  I do not know how to fix this and 
> still make the indices useful.

If you remember I had this problem with my app, the work around I found is to 
read the whole DB on startup (well, the one used for indexing my users) to help 
the OS cache it better (for 10M users it's about 600 megs, so it's still 
reasonable).
Now as a separate file for indexes/primary keys is not an option (even though, 
I still think it would be the easiest solution, that's how mysql does it, I 
don't know about postgresql), an alternative would be to change the layout in 
the file so that indexes/primary keys are stored together in the file and the 
rest of the rows somewhere else in the file (the file could grow by having an 
alternation of index/key data and other data).
At first this might seem slower, but in practice it won't (well, it might just 
be worth a try):
the area of the file that contains indexes/primary keys will be accessed all 
the time and thus cached by the OS while the rest of the data will be less 
cached and discarded from memory.
So even though it looks like 2 seeks and 2 reads (or writes) would be needed 
for every row, in practice, the operations on the index/key part will be from 
cache and will be faster (and when writing using transactions, the added seeks 
should not really increase the time by much).

Nicolas





Reply via email to