Thank you very much for your detailed explanation! I will comment out my calls to ANALYZE, and see how my software performs. The reason why I added it in the first place is that users are allowed to create their own queries, and since not all of them are SQL experts, I wanted the engine to be as tolerant as possible. Perhaps it's a better idea to simply crave decent SQL!
> From: slav...@bigfraud.org > Date: Tue, 8 Feb 2011 15:16:58 +0000 > To: sqlite-users@sqlite.org > Subject: Re: [sqlite] Regarding "Manual Control Of Query Plans" > > > On 8 Feb 2011, at 2:39pm, Sven L wrote: > > > Is it reasonable to guess that sqlite_stat* are used only for NATURAL JOINs > > and other not-so-obvious joins? > > No. Consider this: > > SELECT * FROM myTable WHERE a=104 AND b=213 > > Suppose there are two indexes on myTable: one indexes only column a, and the > other indexes only column b. The query optimizer has to choose between them. > Which index would it be best to use ? > > The answer depends on the chunkiness of each column. If only three different > values ever appear in column a, but column b can have a thousand different > values, then it will be more efficient to use the index on column b. This > will return fewer rows which have to be scanned one-by-one for values in a. > > But you're a good programmer. Knowing that you were going to have SELECT > commands like that one above you chose to create an index on both columns > (either order, it doesn't matter). Since you have done this, the results that > ANALYZE gathers don't matter at all ! The query optimizer finds the good > index and never has to choose between two bad indexes because it has the one > perfect index which will always be best. > > So generally speaking ANALYZE matters only if the query optimizer has two or > more equally bad options. If you do your job as a database designer well it > will never need to consider chunkiness. The above explanation is simplified > but gives you the general idea. > > > In my software, the database sometimes grows up to 1-2 GB immediately due > > to the nature of the application. Hence, the guidelines for the ANALYZE > > command do not suffice for me. "Do it once every x months" they say. User > > chooses to import huge amounts of data on regular basis. He is also able to > > run custom queries, which is why I found the sqlite_stat-tables useful. > > Doesn't matter. This new data won't change the relative chunkiness of the > values in the columns. Though the precise numbers change, the /relative/ > chunkiness probably won't, so the choice of indexes probably won't need to > change either. Even less likely that it'll change by enough to change which > index the query optimiser should pick. > > Doing another ANALYZE only really matters if the character of your data > changes, which is usually when a column that used to have very big chunks > suddenly has tiny chunks. The only time this happens if when a business > changes what it does in a significant way: going from having five product > lines to 500, or from selling in three countries to selling in thirty > countries. After they've been putting in new data reflecting this new usage > for a few months, /then/ it might be useful to run ANALYZE again. > > Even then, the only thing you're changing is what the query optimizer chooses > as the best index. It might make the wrong decision and take 12ms for a > SELECT instead of 4ms. Not really a huge problem: if a delay of 8ms is > mission-critical you're probably using hardware at the edge of its > capabilities. > > Simon. > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users