On 2015-07-27 08:09 PM, Simon Slavin wrote:
> On 27 Jul 2015, at 6:58pm, Sylvain Pointeau <sylvain.pointeau at gmail.com> 
> wrote:
>
>> create table TEST (
>> a TEXT NOT NULL,
>> a2 TEXT NOT NULL,
>> b TEXT NOT NULL,
>> c TEXT NOT NULL
>> );
>>
>> create index IDX_TEST_1 on TEST(a,a2,b,c);
>>
>> insert into TEST(a,a2,b,c) values ('123','1234','456','PP');
>> insert into TEST(a,a2,b,c) values ('234','2345','456','PP');
>> insert into TEST(a,a2,b,c) values ('456','4567','456','PP');
>> insert into TEST(a,a2,b,c) values ('0123','10456','678','DP');
>>
>> sqlite> explain query plan select * from TEST where (a = '123' or
>> a2='1234') and b = '456' and c='PP';
>>
>> 0|0|0|SCAN TABLE TEST
>>
>> whereas
>>
>> sqlite> explain query plan select * from TEST where a = '123' and b = '456'
>> and c='PP';
>>
>> 0|0|0|SEARCH TABLE TEST USING COVERING INDEX IDX_TEST_1 (a=?)
>>
>> how to make this query using an index? is this possible?
> That is a good job of investigation you have done there.  What you didn't 
> take into account is that the order of columns in an index matters.
>
> Your example has known precise values for b and c, and sometimes but not 
> always knows a and a2.  So b and c should be up front in your index:
>
> create index IDX_TEST_2 on TEST(b,c,a,a2);

Quite, but there is also another consideration: Cardinality.  Now it 
might be that this data extract is not representative of the whole table 
at all, but from the looks of it, those columns b and c offer severely 
low cardinality, perhaps in the order of more than 30% of the table in 
duplications.

If this is the case, an index on c or b will amount to pretty much a 
table-scan still. You really want to hit those high-cardinality columns 
in your leading indexer (which is why Richard rightly suggested the 
double index).

I'd almost think having the less-expensive set of these:
create index IDX_TEST_1 on TEST(a);
create index IDX_TEST_2 on TEST(a2);
would produce very near the same speeds for that query if the 
cardinality of columns b and c remain low for populated data.

If however the cardinality scales up pretty well, then Richard's 
suggestion is much better, and if the cardinality will be even higher in 
b and c than in the a's, then Simon's suggestion is best.

Tool for the job etc.

Cheers,
Ryan

Reply via email to