Very large indices could well be a problem when mmap'd, but in our case we used one file per index and the problem did not occur, even in the largest datasets.

The OS's used were Solaris, AIX, Linux and Windows on big anf little end machines. Not HPUX fortunately :-).

As you correctly surmised, the file descriptor was maintained to use for locking and to extend the file. Extending was done as little as possible, in blocks of pages which were appended to the free list for subsequent use. In implementing B-Trees there was no need to have multiple mmap'd regions.

I would be wary of using this technique for large data volumes, but in smaller scale applications it certainly delivers good performance.

Christian Smith wrote:
On Wed, 22 Mar 2006, John Stanton wrote:


Our approach to byte order independence was fairly simple, and worked
well with a mmap'd index.  It involved keeping the just word pointers in
a local byte ordered block if the machine were a different Endian.  The
overhead was next to insignificant.  Our indices were all byte order
independent.



The biggest problem I can envisage is you'd have to implement a lots of
small mmap'd regions, as you can't guarantee the file will fit in
the process memory map. This has performance issues for the kernel, both
in tracking the number of regions, and potentially the management of the
regions (think MMU overhead, TLB shoot-down etc.) Against IO, this
overhead is probably small, but it does add another level of complexity in
the source.

Also, on platforms such as HP-UX, the mmap cache and the regular block
cache are not synched, so having the same file mmap'ed and open as a file
descriptor can cause problems. Another reason why HP-UX sucks, I suppose.

Finally, locking is file descriptor based. You'd have to keep the file
descriptor around, both for the locking, and to increase the file size.



Avoiding buffer shadowing seemed to be one big win, and the other was
letting the OS VM management take control.



Which OS was this, BTW?



JS

Nathan Kurz wrote:

On Wed, Mar 22, 2006 at 10:41:23AM +1100, John Stanton wrote:


The mmap'd index was about three times faster than when it
used a LRU paged cache.


I looked fairly closely into the possibility of using mmap for the
SQLite btree backend, and realized that it would be quite difficult.
Because the SQLite file format is host byte-order independent, it's
almost impossible to use mmap without a separate cache.  If one were
to give up on the cross-platform portability, I think one could get a
significant speedup on large file access, but it would be necessary to
write/adapt the entire backend rather than just making small changes.

Nathan Kurz
[EMAIL PROTECTED]




Reply via email to