Thank you very much for your detailed explanation!
I will comment out my calls to ANALYZE, and see how my software performs.
 
The reason why I added it in the first place is that users are allowed to 
create their own queries, and since not all of them are SQL experts, I wanted 
the engine to be as tolerant as possible. Perhaps it's a better idea to simply 
crave decent SQL!

 
> From: slav...@bigfraud.org
> Date: Tue, 8 Feb 2011 15:16:58 +0000
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] Regarding "Manual Control Of Query Plans"
> 
> 
> On 8 Feb 2011, at 2:39pm, Sven L wrote:
> 
> > Is it reasonable to guess that sqlite_stat* are used only for NATURAL JOINs 
> > and other not-so-obvious joins?
> 
> No. Consider this:
> 
> SELECT * FROM myTable WHERE a=104 AND b=213
> 
> Suppose there are two indexes on myTable: one indexes only column a, and the 
> other indexes only column b. The query optimizer has to choose between them. 
> Which index would it be best to use ?
> 
> The answer depends on the chunkiness of each column. If only three different 
> values ever appear in column a, but column b can have a thousand different 
> values, then it will be more efficient to use the index on column b. This 
> will return fewer rows which have to be scanned one-by-one for values in a.
> 
> But you're a good programmer. Knowing that you were going to have SELECT 
> commands like that one above you chose to create an index on both columns 
> (either order, it doesn't matter). Since you have done this, the results that 
> ANALYZE gathers don't matter at all ! The query optimizer finds the good 
> index and never has to choose between two bad indexes because it has the one 
> perfect index which will always be best.
> 
> So generally speaking ANALYZE matters only if the query optimizer has two or 
> more equally bad options. If you do your job as a database designer well it 
> will never need to consider chunkiness. The above explanation is simplified 
> but gives you the general idea.
> 
> > In my software, the database sometimes grows up to 1-2 GB immediately due 
> > to the nature of the application. Hence, the guidelines for the ANALYZE 
> > command do not suffice for me. "Do it once every x months" they say. User 
> > chooses to import huge amounts of data on regular basis. He is also able to 
> > run custom queries, which is why I found the sqlite_stat-tables useful.
> 
> Doesn't matter. This new data won't change the relative chunkiness of the 
> values in the columns. Though the precise numbers change, the /relative/ 
> chunkiness probably won't, so the choice of indexes probably won't need to 
> change either. Even less likely that it'll change by enough to change which 
> index the query optimiser should pick.
> 
> Doing another ANALYZE only really matters if the character of your data 
> changes, which is usually when a column that used to have very big chunks 
> suddenly has tiny chunks. The only time this happens if when a business 
> changes what it does in a significant way: going from having five product 
> lines to 500, or from selling in three countries to selling in thirty 
> countries. After they've been putting in new data reflecting this new usage 
> for a few months, /then/ it might be useful to run ANALYZE again.
> 
> Even then, the only thing you're changing is what the query optimizer chooses 
> as the best index. It might make the wrong decision and take 12ms for a 
> SELECT instead of 4ms. Not really a huge problem: if a delay of 8ms is 
> mission-critical you're probably using hardware at the edge of its 
> capabilities.
> 
> Simon.
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
                                          
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to