[sqlite] index for OR clause

Sylvain Pointeau Mon, 27 Jul 2015 21:39:27 +0200

On Mon, Jul 27, 2015 at 8:27 PM, R.Smith <rsmith at rsweb.co.za> wrote:


>
>
> On 2015-07-27 08:09 PM, Simon Slavin wrote:
>
>> On 27 Jul 2015, at 6:58pm, Sylvain Pointeau <sylvain.pointeau at gmail.com>
>> wrote:
>>
>>  create table TEST (
>>> a TEXT NOT NULL,
>>> a2 TEXT NOT NULL,
>>> b TEXT NOT NULL,
>>> c TEXT NOT NULL
>>> );
>>>
>>> create index IDX_TEST_1 on TEST(a,a2,b,c);
>>>
>>> insert into TEST(a,a2,b,c) values ('123','1234','456','PP');
>>> insert into TEST(a,a2,b,c) values ('234','2345','456','PP');
>>> insert into TEST(a,a2,b,c) values ('456','4567','456','PP');
>>> insert into TEST(a,a2,b,c) values ('0123','10456','678','DP');
>>>
>>> sqlite> explain query plan select * from TEST where (a = '123' or
>>> a2='1234') and b = '456' and c='PP';
>>>
>>> 0|0|0|SCAN TABLE TEST
>>>
>>> whereas
>>>
>>> sqlite> explain query plan select * from TEST where a = '123' and b =
>>> '456'
>>> and c='PP';
>>>
>>> 0|0|0|SEARCH TABLE TEST USING COVERING INDEX IDX_TEST_1 (a=?)
>>>
>>> how to make this query using an index? is this possible?
>>>
>> That is a good job of investigation you have done there.  What you didn't
>> take into account is that the order of columns in an index matters.
>>
>> Your example has known precise values for b and c, and sometimes but not
>> always knows a and a2.  So b and c should be up front in your index:
>>
>> create index IDX_TEST_2 on TEST(b,c,a,a2);
>>
>
> Quite, but there is also another consideration: Cardinality.  Now it might
> be that this data extract is not representative of the whole table at all,
> but from the looks of it, those columns b and c offer severely low
> cardinality, perhaps in the order of more than 30% of the table in
> duplications.
>
> If this is the case, an index on c or b will amount to pretty much a
> table-scan still. You really want to hit those high-cardinality columns in
> your leading indexer (which is why Richard rightly suggested the double
> index).
>
> I'd almost think having the less-expensive set of these:
> create index IDX_TEST_1 on TEST(a);
> create index IDX_TEST_2 on TEST(a2);
> would produce very near the same speeds for that query if the cardinality
> of columns b and c remain low for populated data.
>
> If however the cardinality scales up pretty well, then Richard's
> suggestion is much better, and if the cardinality will be even higher in b
> and c than in the a's, then Simon's suggestion is best.
>

in my case, c cardinality was low, (a,b) and (a2,b) almost unique

I suppose that in my case an index on c and b would give good result :-)

[sqlite] index for OR clause

Reply via email to