On Mon, Jul 27, 2015 at 8:27 PM, R.Smith <rsmith at rsweb.co.za> wrote:
> > > On 2015-07-27 08:09 PM, Simon Slavin wrote: > >> On 27 Jul 2015, at 6:58pm, Sylvain Pointeau <sylvain.pointeau at gmail.com> >> wrote: >> >> create table TEST ( >>> a TEXT NOT NULL, >>> a2 TEXT NOT NULL, >>> b TEXT NOT NULL, >>> c TEXT NOT NULL >>> ); >>> >>> create index IDX_TEST_1 on TEST(a,a2,b,c); >>> >>> insert into TEST(a,a2,b,c) values ('123','1234','456','PP'); >>> insert into TEST(a,a2,b,c) values ('234','2345','456','PP'); >>> insert into TEST(a,a2,b,c) values ('456','4567','456','PP'); >>> insert into TEST(a,a2,b,c) values ('0123','10456','678','DP'); >>> >>> sqlite> explain query plan select * from TEST where (a = '123' or >>> a2='1234') and b = '456' and c='PP'; >>> >>> 0|0|0|SCAN TABLE TEST >>> >>> whereas >>> >>> sqlite> explain query plan select * from TEST where a = '123' and b = >>> '456' >>> and c='PP'; >>> >>> 0|0|0|SEARCH TABLE TEST USING COVERING INDEX IDX_TEST_1 (a=?) >>> >>> how to make this query using an index? is this possible? >>> >> That is a good job of investigation you have done there. What you didn't >> take into account is that the order of columns in an index matters. >> >> Your example has known precise values for b and c, and sometimes but not >> always knows a and a2. So b and c should be up front in your index: >> >> create index IDX_TEST_2 on TEST(b,c,a,a2); >> > > Quite, but there is also another consideration: Cardinality. Now it might > be that this data extract is not representative of the whole table at all, > but from the looks of it, those columns b and c offer severely low > cardinality, perhaps in the order of more than 30% of the table in > duplications. > > If this is the case, an index on c or b will amount to pretty much a > table-scan still. You really want to hit those high-cardinality columns in > your leading indexer (which is why Richard rightly suggested the double > index). > > I'd almost think having the less-expensive set of these: > create index IDX_TEST_1 on TEST(a); > create index IDX_TEST_2 on TEST(a2); > would produce very near the same speeds for that query if the cardinality > of columns b and c remain low for populated data. > > If however the cardinality scales up pretty well, then Richard's > suggestion is much better, and if the cardinality will be even higher in b > and c than in the a's, then Simon's suggestion is best. > in my case, c cardinality was low, (a,b) and (a2,b) almost unique I suppose that in my case an index on c and b would give good result :-)