Re: [sqlite] Difference in these indices?

John Stanton Wed, 28 Mar 2007 14:58:18 -0800

Perl would not do a good job. You need to use the Sqlite pagestructures and they are defined in C terms.

If you want to make such a program I can give you a template in simpleANSI C. It builds to a different data structure from Sqlite but thealgorithms are there. It uses a quicksort as I recall and draws heavilyon Knuth's Sorting and Searching volume.

If you build a B-Tree by insertions it is continually splitting nodes.A B-Tree grows by extending the root which makes it somewhat selfbalancing but keeps it busy. Enhanced B-Trees will merge nodes tominimize splitting and checkerboarding and enhance balance. Thesplitting is expensive and even a simple insertion can be fairlyexpensive because it may change the interior nodes.

To build one bottom-up you calculate how much space you need based onthe size and count of keys and how many levels you will have. Then yousort the keys and start filling leaf nodes. As you fill a node youinsert an entry into its parent and as you fill a parent you insert intoints parent and so on. Eventually you will have a fully populated treewith the root less than full. You never read a node and only write onewhen it is full so I/O activity is limited. As a buffer you have astack of nodes with a depth equal to the depth of the tree.

You can add some optimization to the tree by making interior nodescontiguous etc.

By using more modern OS capabilities (POSIX) you could build it fasterby extending the Sqlite file by the size of the index, memory mappingthat area and using it as the buffer. When you are finished you unmapthe area and the index is complete. Using that method you perform nowrites and get a 20-50% speed improvement compared to using the writeAPI call.


Stephen Toney wrote:

I may work on such a program, if time permits. If successful I will
share it. It would be in Perl using DBI::ODBC, so may not be amazingly
fast.

I am pretty good at C++ but have phased it out for most work, so I am
still using the antique Sybase compiler, and I doubt the SQLite C++
library would work with that. Otherwise it would, of course, be a better
choice for such a utility. Anyone ever tried that combination?

John, could you clarify what you mean by "building it bottom-up"? I'm
not sure how to build a b-tree any way but by insertions.
Best regards,
Stephen


On Wed, 2007-03-28 at 11:46 -0600, John Stanton wrote:
I proposed such a program earlier in this discussion. I would envisagea seperate program which strips out a list of keys from the database,sorts it then allocates space in the DB file for the resulting index andbuilds it bottom up. It would be an off-line process but fast and wouldmake raising indices on large databases time efficient.
Based on our experience of building a B-Tree with such a programcompared to successive insertions a speed improvement in raising anindex of at least an order of magnitude could be expected.
By making it an independent program it can be lean, mean and fast andnot touch the regular Sqlite library.
Stephen Toney wrote:
On Wed, 2007-03-28 at 08:23 -0600, Dennis Cote wrote:
It might make sense to create a separate standalone utility program(like sqlite3_analyzer) that reuses some the sqlite source to do bulkinserts into a table in a database file as fast a possible with outhaving to worry about locking or journaling etc.
That would solve my problem too (thread: "CREATE INDEX performance" on
indexing a 5.8-million record table). I'd love something like that!
-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------



-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Re: [sqlite] Difference in these indices?

Reply via email to