On 2015-07-27 08:09 PM, Simon Slavin wrote:
> On 27 Jul 2015, at 6:58pm, Sylvain Pointeau <sylvain.pointeau at gmail.com>
> wrote:
>
>> create table TEST (
>> a TEXT NOT NULL,
>> a2 TEXT NOT NULL,
>> b TEXT NOT NULL,
>> c TEXT NOT NULL
>> );
>>
>> create index IDX_TEST_1 on TEST(a,a2,b,c);
>>
>> insert into TEST(a,a2,b,c) values ('123','1234','456','PP');
>> insert into TEST(a,a2,b,c) values ('234','2345','456','PP');
>> insert into TEST(a,a2,b,c) values ('456','4567','456','PP');
>> insert into TEST(a,a2,b,c) values ('0123','10456','678','DP');
>>
>> sqlite> explain query plan select * from TEST where (a = '123' or
>> a2='1234') and b = '456' and c='PP';
>>
>> 0|0|0|SCAN TABLE TEST
>>
>> whereas
>>
>> sqlite> explain query plan select * from TEST where a = '123' and b = '456'
>> and c='PP';
>>
>> 0|0|0|SEARCH TABLE TEST USING COVERING INDEX IDX_TEST_1 (a=?)
>>
>> how to make this query using an index? is this possible?
> That is a good job of investigation you have done there. What you didn't
> take into account is that the order of columns in an index matters.
>
> Your example has known precise values for b and c, and sometimes but not
> always knows a and a2. So b and c should be up front in your index:
>
> create index IDX_TEST_2 on TEST(b,c,a,a2);
Quite, but there is also another consideration: Cardinality. Now it
might be that this data extract is not representative of the whole table
at all, but from the looks of it, those columns b and c offer severely
low cardinality, perhaps in the order of more than 30% of the table in
duplications.
If this is the case, an index on c or b will amount to pretty much a
table-scan still. You really want to hit those high-cardinality columns
in your leading indexer (which is why Richard rightly suggested the
double index).
I'd almost think having the less-expensive set of these:
create index IDX_TEST_1 on TEST(a);
create index IDX_TEST_2 on TEST(a2);
would produce very near the same speeds for that query if the
cardinality of columns b and c remain low for populated data.
If however the cardinality scales up pretty well, then Richard's
suggestion is much better, and if the cardinality will be even higher in
b and c than in the a's, then Simon's suggestion is best.
Tool for the job etc.
Cheers,
Ryan