On 18 Jul 2010, at 2:56pm, Peng Yu wrote:

> On Sun, Jul 18, 2010 at 12:08 AM, Simon Slavin <slav...@bigfraud.org> wrote:
>> 
>> On 18 Jul 2010, at 5:46am, Peng Yu wrote:
>> 
>>> I'm wondering, whether I should
>>> create one index on multiple columns or create multiple indexes, each
>>> of which is on a single column. [snip]
>> 
>> We would have to see an example SELECT to be sure.  But even if you create 
>> one index on multiple columns, the order of the columns in the index 
>> matters.  If you had three columns you could create six indexes to get each 
>> possible combination.  Probably easier to explain to us your SELECT 
>> statements.
> 
> So, in general, what the index I should use depends on what the select
> statement I might use?

Exactly.  There is one index which is ideal for each SELECT statement.  If you 
use two different SELECT statements, you might create one index ideal for each 
one.  (Slight simplification but that's more or less it.)  There is no point in 
arbitrarily creating separate indices for every column: most of them will never 
be used and the ones which aren't used won't speed things up much.

> Currently, I need to select all the rows that have unique content in a
> subset of the columns and get such columns.
> 
> create table test (id integer primary key, value1 text, value2 text,
> value3 text);
> 
> insert into test (value1,value2,value3) values('x', 'x', 'x');
> insert into test (value1,value2,value3) values('x', 'x', 'y');
> insert into test (value1,value2,value3) values('x', 'y', 'x');
> insert into test (value1,value2,value3) values('x', 'y', 'y');
> insert into test (value1,value2,value3) values('y', 'x', 'x');
> insert into test (value1,value2,value3) values('y', 'x', 'y');
> insert into test (value1,value2,value3) values('y', 'y', 'x');
> insert into test (value1,value2,value3) values('y', 'y', 'y');
> 
> select * from test group by value1,value2;

Ask yourself which order would be best for SQL to be able to look at the 
records.  The answer is that they should be sorted so that all the records with 
the same value1 are together, and then within each of those groups, they should 
be sorted so that all the records with the same value2 are together.  So a good 
index would be CREATE INDEX v1v2 ON test (value1,value2);

sqlite> .timer ON
sqlite> select * from test group by value1,value2;
2|x|x|y
4|x|y|y
6|y|x|y
8|y|y|y
CPU Time: user 0.000201 sys 0.000075
sqlite> CREATE INDEX v1 ON test (value1);
CPU Time: user 0.000237 sys 0.000895
sqlite> select * from test group by value1,value2;
2|x|x|y
4|x|y|y
6|y|x|y
8|y|y|y
CPU Time: user 0.000194 sys 0.000084    <-- a slight improvement
sqlite> CREATE INDEX v1v2 ON test (value1,value2);
CPU Time: user 0.000286 sys 0.000722
sqlite> select * from test group by value1,value2;
2|x|x|y
4|x|y|y
6|y|x|y
8|y|y|y
CPU Time: user 0.000160 sys 0.000088    <-- a better improvement

This test data does not look like 'real' data because they contain equally 
'clumped' data for each value.  Also you have so few rows in your test table 
that the overhead for doing any SELECT statement is swallowing the differences 
between different SELECT statements.  Timings like this will give you a better 
guide if you check it out with read data.

> select * from test group by value1||value2;

For this command, you need sorting by a value which is not stored in the table. 
 No index will help here. If you do this sort of search a lot, consider 
creating a new column in your table which contains 'value1||value2'.

Simon.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to