John, Thanks for the useful info. Unfortunately it sounds as if this is more than I have time for right now.
Stephen On Wed, 2007-03-28 at 16:29 -0600, John Stanton wrote: > Perl would not do a good job. You need to use the Sqlite page > structures and they are defined in C terms. > > If you want to make such a program I can give you a template in simple > ANSI C. It builds to a different data structure from Sqlite but the > algorithms are there. It uses a quicksort as I recall and draws heavily > on Knuth's Sorting and Searching volume. > > If you build a B-Tree by insertions it is continually splitting nodes. > A B-Tree grows by extending the root which makes it somewhat self > balancing but keeps it busy. Enhanced B-Trees will merge nodes to > minimize splitting and checkerboarding and enhance balance. The > splitting is expensive and even a simple insertion can be fairly > expensive because it may change the interior nodes. > > To build one bottom-up you calculate how much space you need based on > the size and count of keys and how many levels you will have. Then you > sort the keys and start filling leaf nodes. As you fill a node you > insert an entry into its parent and as you fill a parent you insert into > ints parent and so on. Eventually you will have a fully populated tree > with the root less than full. You never read a node and only write one > when it is full so I/O activity is limited. As a buffer you have a > stack of nodes with a depth equal to the depth of the tree. > > You can add some optimization to the tree by making interior nodes > contiguous etc. > > By using more modern OS capabilities (POSIX) you could build it faster > by extending the Sqlite file by the size of the index, memory mapping > that area and using it as the buffer. When you are finished you unmap > the area and the index is complete. Using that method you perform no > writes and get a 20-50% speed improvement compared to using the write > API call. > > Stephen Toney wrote: > > I may work on such a program, if time permits. If successful I will > > share it. It would be in Perl using DBI::ODBC, so may not be amazingly > > fast. > > > > I am pretty good at C++ but have phased it out for most work, so I am > > still using the antique Sybase compiler, and I doubt the SQLite C++ > > library would work with that. Otherwise it would, of course, be a better > > choice for such a utility. Anyone ever tried that combination? > > > > John, could you clarify what you mean by "building it bottom-up"? I'm > > not sure how to build a b-tree any way but by insertions. > > > > Best regards, > > Stephen > > > > > > On Wed, 2007-03-28 at 11:46 -0600, John Stanton wrote: > > > >>I proposed such a program earlier in this discussion. I would envisage > >>a seperate program which strips out a list of keys from the database, > >>sorts it then allocates space in the DB file for the resulting index and > >>builds it bottom up. It would be an off-line process but fast and would > >>make raising indices on large databases time efficient. > >> > >>Based on our experience of building a B-Tree with such a program > >>compared to successive insertions a speed improvement in raising an > >>index of at least an order of magnitude could be expected. > >> > >>By making it an independent program it can be lean, mean and fast and > >>not touch the regular Sqlite library. > >> > >>Stephen Toney wrote: > >> > >>>On Wed, 2007-03-28 at 08:23 -0600, Dennis Cote wrote: > >>> > >>> > >>> > >>>>It might make sense to create a separate standalone utility program > >>>>(like sqlite3_analyzer) that reuses some the sqlite source to do bulk > >>>>inserts into a table in a database file as fast a possible with out > >>>>having to worry about locking or journaling etc. > >>> > >>> > >>>That would solve my problem too (thread: "CREATE INDEX performance" on > >>>indexing a 5.8-million record table). I'd love something like that! > >>> > >>> > >> > >> > >>----------------------------------------------------------------------------- > >>To unsubscribe, send email to [EMAIL PROTECTED] > >>----------------------------------------------------------------------------- > > > ----------------------------------------------------------------------------- > To unsubscribe, send email to [EMAIL PROTECTED] > ----------------------------------------------------------------------------- -- Stephen Toney Systems Planning [EMAIL PROTECTED] http://www.systemsplanning.com ----------------------------------------------------------------------------- To unsubscribe, send email to [EMAIL PROTECTED] -----------------------------------------------------------------------------