Perl would not do a good job. You need to use the Sqlite page structures and they are defined in C terms.

If you want to make such a program I can give you a template in simple ANSI C. It builds to a different data structure from Sqlite but the algorithms are there. It uses a quicksort as I recall and draws heavily on Knuth's Sorting and Searching volume.

If you build a B-Tree by insertions it is continually splitting nodes. A B-Tree grows by extending the root which makes it somewhat self balancing but keeps it busy. Enhanced B-Trees will merge nodes to minimize splitting and checkerboarding and enhance balance. The splitting is expensive and even a simple insertion can be fairly expensive because it may change the interior nodes.

To build one bottom-up you calculate how much space you need based on the size and count of keys and how many levels you will have. Then you sort the keys and start filling leaf nodes. As you fill a node you insert an entry into its parent and as you fill a parent you insert into ints parent and so on. Eventually you will have a fully populated tree with the root less than full. You never read a node and only write one when it is full so I/O activity is limited. As a buffer you have a stack of nodes with a depth equal to the depth of the tree.

You can add some optimization to the tree by making interior nodes contiguous etc.

By using more modern OS capabilities (POSIX) you could build it faster by extending the Sqlite file by the size of the index, memory mapping that area and using it as the buffer. When you are finished you unmap the area and the index is complete. Using that method you perform no writes and get a 20-50% speed improvement compared to using the write API call.

Stephen Toney wrote:
I may work on such a program, if time permits. If successful I will
share it. It would be in Perl using DBI::ODBC, so may not be amazingly
fast.

I am pretty good at C++ but have phased it out for most work, so I am
still using the antique Sybase compiler, and I doubt the SQLite C++
library would work with that. Otherwise it would, of course, be a better
choice for such a utility. Anyone ever tried that combination?

John, could you clarify what you mean by "building it bottom-up"? I'm
not sure how to build a b-tree any way but by insertions.
Best regards,
Stephen


On Wed, 2007-03-28 at 11:46 -0600, John Stanton wrote:

I proposed such a program earlier in this discussion. I would envisage a seperate program which strips out a list of keys from the database, sorts it then allocates space in the DB file for the resulting index and builds it bottom up. It would be an off-line process but fast and would make raising indices on large databases time efficient.

Based on our experience of building a B-Tree with such a program compared to successive insertions a speed improvement in raising an index of at least an order of magnitude could be expected.

By making it an independent program it can be lean, mean and fast and not touch the regular Sqlite library.

Stephen Toney wrote:

On Wed, 2007-03-28 at 08:23 -0600, Dennis Cote wrote:



It might make sense to create a separate standalone utility program (like sqlite3_analyzer) that reuses some the sqlite source to do bulk inserts into a table in a database file as fast a possible with out having to worry about locking or journaling etc.


That would solve my problem too (thread: "CREATE INDEX performance" on
indexing a 5.8-million record table). I'd love something like that!




-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------


-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to