John,

Thanks for the useful info. Unfortunately it sounds as if this is more
than I have time for right now. 

Stephen


On Wed, 2007-03-28 at 16:29 -0600, John Stanton wrote:
> Perl would not do a good job.  You need to use the Sqlite page 
> structures and they are defined in C terms.
> 
> If you want to make such a program I can give you a template in simple 
> ANSI C.  It builds to a different data structure from Sqlite but the 
> algorithms are there.  It uses a quicksort as I recall and draws heavily 
> on Knuth's Sorting and Searching volume.
> 
> If you build a B-Tree by insertions it is continually splitting nodes. 
> A B-Tree grows by extending the root which makes it somewhat self 
> balancing but keeps it busy.  Enhanced B-Trees will merge nodes to 
> minimize splitting and checkerboarding and enhance balance.  The 
> splitting is expensive and even a simple insertion can be fairly 
> expensive because it may change the interior nodes.
> 
> To build one bottom-up you calculate how much space you need based on 
> the size and count of keys and how many levels you will have.  Then you 
> sort the keys and start filling leaf nodes.  As you fill a node you 
> insert an entry into its parent and as you fill a parent you insert into 
> ints parent and so on.  Eventually you will have a fully populated tree 
> with the root less than full.  You never read a node and only write one 
> when it is full so I/O activity is limited.  As a buffer you have a 
> stack of nodes with a depth equal to the depth of the tree.
> 
> You can add some optimization to the tree by making interior nodes 
> contiguous etc.
> 
> By using more modern OS capabilities (POSIX) you could build it faster 
> by extending the Sqlite file by the size of the index, memory mapping 
> that area and using it as the buffer.  When you are finished you unmap 
> the area and the index is complete.  Using that method you perform no 
> writes and get a 20-50% speed improvement compared to using the write 
> API call.
> 
> Stephen Toney wrote:
> > I may work on such a program, if time permits. If successful I will
> > share it. It would be in Perl using DBI::ODBC, so may not be amazingly
> > fast.
> > 
> > I am pretty good at C++ but have phased it out for most work, so I am
> > still using the antique Sybase compiler, and I doubt the SQLite C++
> > library would work with that. Otherwise it would, of course, be a better
> > choice for such a utility. Anyone ever tried that combination?
> > 
> > John, could you clarify what you mean by "building it bottom-up"? I'm
> > not sure how to build a b-tree any way but by insertions. 
> > 
> > Best regards,
> > Stephen
> > 
> > 
> > On Wed, 2007-03-28 at 11:46 -0600, John Stanton wrote:
> > 
> >>I proposed such a program earlier in this discussion.  I would envisage 
> >>a seperate program which strips out a list of keys from the database, 
> >>sorts it then allocates space in the DB file for the resulting index and 
> >>builds it bottom up.  It would be an off-line process but fast and would 
> >>make raising indices on large databases time efficient.
> >>
> >>Based on our experience of building a B-Tree with such a program 
> >>compared to successive insertions a speed improvement in raising an 
> >>index of at least an order of magnitude could be expected.
> >>
> >>By making it an independent program it can be lean, mean and fast and 
> >>not touch the regular Sqlite library.
> >>
> >>Stephen Toney wrote:
> >>
> >>>On Wed, 2007-03-28 at 08:23 -0600, Dennis Cote wrote:
> >>>
> >>>
> >>>
> >>>>It might make sense to create a separate standalone utility program 
> >>>>(like sqlite3_analyzer) that reuses some the sqlite  source to do bulk 
> >>>>inserts into a table in a database file as fast a possible with out 
> >>>>having to worry about locking or journaling etc.
> >>>
> >>>
> >>>That would solve my problem too (thread: "CREATE INDEX performance" on
> >>>indexing a 5.8-million record table). I'd love something like that!
> >>>
> >>>
> >>
> >>
> >>-----------------------------------------------------------------------------
> >>To unsubscribe, send email to [EMAIL PROTECTED]
> >>-----------------------------------------------------------------------------
> 
> 
> -----------------------------------------------------------------------------
> To unsubscribe, send email to [EMAIL PROTECTED]
> -----------------------------------------------------------------------------
-- 

Stephen Toney
Systems Planning
[EMAIL PROTECTED]
http://www.systemsplanning.com


-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to