On 21 Mar 2011, at 3:33am, Guilherme Batista wrote: > I'll implement some changes in the sqlite query optimization for study > purpose... before I start studying the code please help me understand the > basic idea... > > 1) The basic idea of optimization is to optmize and then execute. Where in > the code the plan is built and where it is executed?
First, you need to know if you are optimizing. There is a big difference between "I could put some effort into the program and make it run faster." and "My program runs so slowly that the users find it annoying to use." The first situation is not optimizing. It is just hacking because you have time on your hands. It is definitely worth knowing what 'premature optimization' is: <http://startup-marketing.com/optimization-mistakes-that-kill-startups/> Once you know that you're optimizing, you need to know what you're optimizing for. Are you optimizing for speed ? Or for filespace ? Or for the speed of one operation of your program in particular ? Remember that optimizing for one thing will probably reduce performance for something else. So you may be able to make inputting data faster, but the cost will be that SELECTs take more time. Or you may be able to speed up your SELECT commands, but that will increase the amount of disk space your database file needs. And whatever your results are it will almost certainly make your programming or data structure harder to understand, which will be a problem when you come to fix some bugs in a year's time. And whatever you do will take time which you might more usefully spend adding another feature to your program. So yes: optimize. But only when something needs it. Not just because your program is not 'as fast as it could be'. > 2) I understand that ANALYZE is used to gather statistics about the > available indices and helps the query optimizer. So if if I've never ran the > analyze... what statistics the optmizer will use? If I never run ANALYZE the > statistics will never be updated? You have it nearly right. ANALYZE gathers statistics about your data, not your indexes. If you never run ANALYZE the query optimizer will make a less-informed guess about how to do a search. Often the optimizer will still guess correctly, meaning that there is no advantage to running ANALYZE. Sometimes it doesn't matter how it chooses to search because both ways take the same amount of time. Here is an example of what ANALYZE will help with. Suppose you have a TABLE employees with ten thousand rows and no indexes, and you execute SELECT id,firstname,surname FROM employees WHERE firstname='Guilherme' AND age=46 The query optimizer has a choice. It can first pick out all the Guilhermes, and then find those who are the right age, or it can first pick out all the 46 year olds, then find any with the right firstname. It has to do two operations, and it must do the first one on the entire list of members. The only way to make things faster is if the second operation needs to worry about the smallest number of rows possible. There are only two options: name first then age, or age first then name. So the job of ANALYZE is to tell the optimizer how 'chunky' each column is. It can say "Almost everyone who works for us is called 'Guilherme', so searching on that first isn't going to be much help. On the other hand we only have three people who are 46 years old, so once you've got those searching by firstname is going to be very fast.". But this is data-dependent: another company might only employ people in their 46s, but may have only one person with that firstname. There's no way to guess correctly from just the schema. So ANALYZE does the job of collecting information on how 'chunky' each column is. It's only useful if the chunkiness of the two columns is very different. If you have about 10 people with that name, and about 10 people of that age, it doesn't matter which column the SELECT searches first. Also, ANALYZE helps only if the programmer did not provide an index ideally suited to the search. For instance, if either of these two indexes existed CREATE INDEX empFNAge ON employees (firstname,age) CREATE INDEX empAgeFN ON employees (age,firstname) then the SELECT would just use that index and would find the right rows without having to do any searching at all. This would be far faster than any of the options above. But it would make the database take up more filespace. But if you allow your users to make up their own searches, so you can't supply an index for every combination of things they might think up, ANALYZE can be excellent at speeding things up. Simon. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users