On 21 Mar 2011, at 3:33am, Guilherme Batista wrote:

> I'll implement some changes in the sqlite query optimization for study
> purpose... before I start studying the code please help me understand the
> basic idea...
> 
> 1) The basic idea of optimization is to optmize and then execute. Where in
> the code the plan is built and where it is executed?

First, you need to know if you are optimizing.  There is a big difference 
between

"I could put some effort into the program and make it run faster."

and

"My program runs so slowly that the users find it annoying to use."

The first situation is not optimizing.  It is just hacking because you have 
time on your hands.  It is definitely worth knowing what 'premature 
optimization' is:

<http://startup-marketing.com/optimization-mistakes-that-kill-startups/>

Once you know that you're optimizing, you need to know what you're optimizing 
for.  Are you optimizing for speed ?  Or for filespace ?  Or for the speed of 
one operation of your program in particular ?  Remember that optimizing for one 
thing will probably reduce performance for something else.  So you may be able 
to make inputting data faster, but the cost will be that SELECTs take more 
time.  Or you may be able to speed up your SELECT commands, but that will 
increase the amount of disk space your database file needs.

And whatever your results are it will almost certainly make your programming or 
data structure harder to understand, which will be a problem when you come to 
fix some bugs in a year's time.  And whatever you do will take time which you 
might more usefully spend adding another feature to your program.

So yes: optimize.  But only when something needs it.  Not just because your 
program is not 'as fast as it could be'.

> 2) I understand that ANALYZE is used to gather statistics about the
> available indices and helps the query optimizer. So if if I've never ran the
> analyze... what statistics the optmizer will use? If I never run ANALYZE the
> statistics will never be updated?

You have it nearly right.  ANALYZE gathers statistics about your data, not your 
indexes.  If you never run ANALYZE the query optimizer will make a 
less-informed guess about how to do a search.  Often the optimizer will still 
guess correctly, meaning that there is no advantage to running ANALYZE.  
Sometimes it doesn't matter how it chooses to search because both ways take the 
same amount of time.  Here is an example of what ANALYZE will help with.

Suppose you have a TABLE employees with ten thousand rows and no indexes, and 
you execute

SELECT id,firstname,surname FROM employees WHERE firstname='Guilherme' AND 
age=46

The query optimizer has a choice.  It can first pick out all the Guilhermes, 
and then find those who are the right age, or it can first pick out all the 46 
year olds, then find any with the right firstname.  It has to do two 
operations, and it must do the first one on the entire list of members.  The 
only way to make things faster is if the second operation needs to worry about 
the smallest number of rows possible.  There are only two options: name first 
then age, or age first then name.

So the job of ANALYZE is to tell the optimizer how 'chunky' each column is.  It 
can say "Almost everyone who works for us is called 'Guilherme', so searching 
on that first isn't going to be much help.  On the other hand we only have 
three people who are 46 years old, so once you've got those searching by 
firstname is going to be very fast.".

But this is data-dependent: another company might only employ people in their 
46s, but may have only one person with that firstname.  There's no way to guess 
correctly from just the schema.  So ANALYZE does the job of collecting 
information on how 'chunky' each column is.  It's only useful if the chunkiness 
of the two columns is very different.  If you have about 10 people with that 
name, and about 10 people of that age, it doesn't matter which column the 
SELECT searches first.  Also, ANALYZE helps only if the programmer did not 
provide an index ideally suited to the search.  For instance, if either of 
these two indexes existed

CREATE INDEX empFNAge ON employees (firstname,age)
CREATE INDEX empAgeFN ON employees (age,firstname)

then the SELECT would just use that index and would find the right rows without 
having to do any searching at all.  This would be far faster than any of the 
options above.  But it would make the database take up more filespace.

But if you allow your users to make up their own searches, so you can't supply 
an index for every combination of things they might think up, ANALYZE can be 
excellent at speeding things up.

Simon.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to