Re: using multiple columns Index

Maksim Timonin Thu, 16 Dec 2021 06:09:27 -0800

Hi, Chrystophe!

Multifield index should perfectly work for cases with strict equality like:
(a == ? && b == ?) OR (a == ? && b > ?) . But for queries with range
queries for first field "a" ("a" > ? && b ...) you should not expect a
boost of performance. You're right, it's due to B+Tree implementation - we
store data in pairs (a, b). And then in the index storage the sequence will
be: (1, 1), (1, 2), (1, 3), (2, 1). We sort by A, and only in case of
equality field A, we check field B.


So for queries like (a > 0 && b < 10) there is not much help from the
condition on B for reducing data slice. Our implementation doesn't skip
some sub-trees for conditions, but it checks tree range sequentially.

> Worst, if the value of b is not present in the slice, it is responding as
if the b was not in the WHERE clause at all (it seems to run a full scan on
the sub-result)

I think it can depend on the amount of data you return. If there is no data
suitable for your condition, you will hang until the query finishes. But if
you have some, it will return the cursor earlier, after preparing the first
page for response, see SqlFieldsQuery.setPageSize().

Also, performance may depend on your index selectivity.


On Thu, Dec 16, 2021 at 4:23 PM Chrystophe Vergnaud <
[email protected]> wrote:

> Hello Stephen,
>
> I was created with SQL :
> CREATE INDEX IF NOT EXISTS "t_idx_1" ON MYSCHEMA."t" ("a", "b", "c");
>
> BR,
>
> Chrystophe Vergnaud
> Architect @ Cyblex Technologies
>
>
> Le jeu. 16 déc. 2021 à 13:51, Stephen Darlington <
> [email protected]> a écrit :
>
>> Can you show how you’ve defined your index(es)?
>>
>> > On 16 Dec 2021, at 12:27, Chrystophe Vergnaud <
>> [email protected]> wrote:
>> >
>> > Hello,
>> >
>> > I'm running an ignite 2.10 and I don't understand the behavior of the
>> multi-columns index.
>> >
>> > For instance, I have a table t(id, a,b,c, d, e, f, g)
>> > - id is a uuid and is the key
>> > - a is a TIMESTAMP
>> > - b is a  SMALLINT
>> > - c is a TINYINT
>> > - e, f, g are VARCHAR
>> >
>> > this table have around 200M lines
>> >
>> > I have to select data based on a, b, c in this order, so basically, I
>> have setup an index on (a,b,c)
>> >
>> > If I apply a select with a WHERE clause on "a>=x AND a < y", it works
>> perfectly, the response time is ok (using USE INDEX)
>> >
>> > If I add the b in the WHERE clause I expect to optimize the response
>> time but it is not the case. Worst, if the value of b is not present in the
>> slice, it is responding as if the b was not in the WHERE clause at all (it
>> seems to run a full scan on the sub-result)
>> >
>> > Do I miss something ? is it related to the implementation of the B+tree
>> ?
>> >
>> > Thanks in advance for your help.
>> >
>> > Best regards,
>> >
>> > Chrystophe Vergnaud
>> > Architect @ Cyblex Technologies
>>
>>
>>

Re: using multiple columns Index

Reply via email to