Re: [sqlite] Optimization Question for SQLite Experts

Bill KING Thu, 28 Feb 2008 21:15:24 -0800

You obviously have a set of UID's at the time of the loop, how about
creating a huge select .. from where ...IN (list_of_uids_comma_separated)?


It'll be one single query (or you can break it down into blocks of 50,
or 100, etc).

Will save the overhead of generating the queries over and over again
(essentially giving you loop unrolling).

Then you can just loop over the records in the one single dataset returned.

Mark Gilbert wrote:
> Folks.
>
> Looking for some advice from hardened SQliters...
>
> - Our application uses an SQLite 3.4.1 database with 8 tables.  1 of 
> the tables may contain tens or maybe hundreds of thousands of records 
> with about 30 fields.
>
> - The tables form a linked tree type hierarchy where one table is the 
> trunk, another is the boughs and then branches, twigs and finally 
> very many leaves.  UIDs in the tables allow us to find the children 
> of a record in the next table.
>
> In one operation we have to construct a summary of a portion of the 
> tree.  This basically means picking the stem, then recursively 
> finding its branches, their twigs and so on.
>
> For each twig we  have to find all the leaves.  The Leaves table has 
> maybe 15000 records and we have a query where we search the Leaves 
> table once for each twig we find.  In our part of the tree there 
> might be 200 twigs, and so we end up searching the leaves DB 200 
> times, to build up the list of leaves attached to the twigs
>
> Unlike a real tree, our leaves may be attached to more than one twig, 
> so we need to keep the tables linked but abstracted, hence the 
> separate tables.
>
> The problem we have is performance......
>
> forgetting the main part of the tree, let's jump to leaves:
>
> - We have queried all the way down to the twigs quite quickly, and we 
> have found 200 twigs.  Each twig is connected to a few leaves.
>
> in pseudocode:
> for (z=0;z<numTwigs;z++)  //there are 200
> {
>     sqlite3_get_table("select * from LEAVES where leafID = 
> theTwig[z].childLeafID")
> // we end up performing this query 200 times, once for each of the 
> parent twigs.
> }
>
> It may take 100mSec to perform  each select  query on our 15000 entry 
> LEAVES table (which have about 30 fields each, most of which we 
> want), this ends up taking 20 seconds which is not acceptable in our 
> application.
>
> I am certain that we are missing some important SQL language or 
> SQLite optimisations, and I wanted to ask for advice.    We are 
> already using BEGIN and COMMIT transactions around this code 
> (although see later for my concerns about that, since maybe its not 
> working)
>
> Profiling the code, it spends alot of the time reading into the SQLite pager.
>
> Other information:
> - the app is a highly multithreaded server with query connections 
> spawing new threads per request. Each connection needs to query the 
> database, and our approach thus far (in order to improve performance) 
> is to open the database ONCE (on the first thread that needs it) call 
> BEGIN and leave it open.  We then have a lock which allows later 
> threads to request the database lock (we pass the global  DB handle 
> back to the new thread). When its finished it releases it and another 
> thread can get the handle.  Once every minute we COMMIT to flush the 
> database, back it up, then BEGIN another transaction.
>
> I am slightly concerned that our BEGIN, COMMIT and usage of the 
> handle are made on different threads, and this may be somehow 
> defeating the transaction or not taking full advantage of the page 
> cache.  Perhaps we should close the DB down properly and do the 
> entire open process and close transaction locally in the new thread ? 
> This is all a READ ONLY transaction, by the way.
>
> We have considered copying the entire database into a memory resident 
> SQLite database at the start of this transaction  (finding all the 
> records ONCE takes less than a second, so we may be able to afford 
> this drastic step) but I have read that memory resident databases may 
> be not much faster.
>
> We have even considered finding all the records ONCE into our own 
> cache them manually performing the searches using our own code 
> (hoping it would be faster), but I can't help feeling that we should 
> be able to do this faster just using SQLite.
>
> We have considered preparing the transaction first binding the 
> parameter, and I wondered if that might help ?  Would preparing:
> "select * from LEAVES where leafID  =  boundParam1"
> really make much difference than just running it ?
>
> We are typically using a Quad 3.0GHz processor Macintosh so we have 
> plenty of horsepower.  I can't really believe that we should accept 
> that this process will take as long as 20 seconds.  I am convinced it 
> can be massively optimised.
>
> There are so many options, each of which would require quote alot of 
> effort to prototype and test, I wondered what wisdom any hardened 
> SQlite folk could offer.   What is the best approach to attack this 
> problem ?
>
> Thanks for any experience you can share..
>
> Mark
>
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>   


-- 
Bill King, Software Engineer
Trolltech, Brisbane Technology Park
26 Brandl St, Eight Mile Plains, 
QLD, Australia, 4113
Tel + 61 7 3219 9906 (x137)
Fax + 61 7 3219 9938
mobile: 0423 532 733

_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Optimization Question for SQLite Experts

Reply via email to