Hello!

I can see you try to use _key_PK as index. If your primary key is
composite, it won't work properly for you. I recommend creating an explicit
(category_id, customer_id) index.

Regards,
-- 
Ilya Kasnacheev


вт, 18 сент. 2018 г. в 17:47, eugene miretsky <eugene.miret...@gmail.com>:

> Hi Ilya,
>
> The different query result was my mistake - one of the categoy_ids was
> duplicate, so in the query that used join, it counted rows for that
> category twice. My apologies.
>
> However, we are still having an issue with query time, and the index not
> being applied to category_id. Would appreciate if you could take a look.
>
> Cheers,
> Eugene
>
> On Mon, Sep 17, 2018 at 9:15 AM Ilya Kasnacheev <ilya.kasnach...@gmail.com>
> wrote:
>
>> Hello!
>>
>> Why don't you diff the results of those two queries, tell us what the
>> difference is?
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> пн, 17 сент. 2018 г. в 16:08, eugene miretsky <eugene.miret...@gmail.com
>> >:
>>
>>> Hello,
>>>
>>> Just wanted to see if anybody had time to look into this.
>>>
>>> Cheers,
>>> Eugene
>>>
>>> On Wed, Sep 12, 2018 at 6:29 PM eugene miretsky <
>>> eugene.miret...@gmail.com> wrote:
>>>
>>>> Thanks!
>>>>
>>>> Tried joining with an inlined table instead of IN as per the second
>>>> suggestion, and it didn't quite work.
>>>>
>>>> Query1:
>>>>
>>>>    - Select COUNT(*) FROM( Select customer_id from GATABLE3  use
>>>>    Index( ) where category_id in (9005, 175930, 175930, 
>>>> 175940,175945,101450,
>>>>    6453) group by customer_id having SUM(product_views_app) > 2 OR
>>>>    SUM(product_clicks_app) > 1 )
>>>>    - exec time = 17s
>>>>    - *Result: 3105868*
>>>>    - Same exec time if using AFFINITY_KEY index or "_key_PK_hash or
>>>>    customer_id index
>>>>    - Using an index on category_id increases the query time 33s
>>>>
>>>> Query2:
>>>>
>>>>    - Select COUNT(*) FROM( Select customer_id from GATABLE3 ga  use
>>>>    index (PUBLIC."_key_PK") inner join table(category_id int = (9005, 
>>>> 175930,
>>>>    175930, 175940,175945,101450, 6453)) cats on cats.category_id =
>>>>    ga.category_id   group by customer_id having SUM(product_views_app) > 2 
>>>> OR
>>>>    SUM(product_clicks_app) > 1 )
>>>>    - exec time = 38s
>>>>    - *Result: 3113921*
>>>>    - Same exec time if using AFFINITY_KEY index or "_key_PK_hash or
>>>>    customer_id index or category_id index
>>>>    - Using an index on category_id doesnt change the run time
>>>>
>>>> Query plans are attached.
>>>>
>>>> 3 questions:
>>>>
>>>>    1. Why is the result differnt for the 2 queries - this is quite
>>>>    concerning.
>>>>    2. Why is the 2nd query taking longer
>>>>    3. Why  category_id index doesn't work in case of query 2.
>>>>
>>>>
>>>> On Wed, Sep 5, 2018 at 8:31 AM Ilya Kasnacheev <
>>>> ilya.kasnach...@gmail.com> wrote:
>>>>
>>>>> Hello!
>>>>>
>>>>> I don't think that we're able to use index with IN () clauses. Please
>>>>> convert it into OR clauses.
>>>>>
>>>>> Please see
>>>>> https://apacheignite-sql.readme.io/docs/performance-and-debugging#section-sql-performance-and-usability-considerations
>>>>>
>>>>> Regards,
>>>>> --
>>>>> Ilya Kasnacheev
>>>>>
>>>>>
>>>>> пн, 3 сент. 2018 г. в 12:46, Andrey Mashenkov <
>>>>> andrey.mashen...@gmail.com>:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> Actually, first query uses index on affinity key which looks more
>>>>>> efficient than index on category_id column.
>>>>>> The first query can process groups one by one and stream partial
>>>>>> results from map phase to reduce phase as it use sorted index lookup,
>>>>>> while second query should process full dataset on map phase before
>>>>>> pass it for reducing.
>>>>>>
>>>>>> Try to use composite index (customer_id, category_id).
>>>>>>
>>>>>> Also, SqlQueryFields.setCollocated(true) flag can help Ignite to
>>>>>> build more efficient plan when group by on collocated column is used.
>>>>>>
>>>>>> On Sun, Sep 2, 2018 at 2:02 AM eugene miretsky <
>>>>>> eugene.miret...@gmail.com> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Schema:
>>>>>>>
>>>>>>>    -
>>>>>>>
>>>>>>>    PUBLIC.GATABLE2.CUSTOMER_ID
>>>>>>>
>>>>>>>    PUBLIC.GATABLE2.DT
>>>>>>>
>>>>>>>    PUBLIC.GATABLE2.CATEGORY_ID
>>>>>>>
>>>>>>>    PUBLIC.GATABLE2.VERTICAL_ID
>>>>>>>
>>>>>>>    PUBLIC.GATABLE2.SERVICE
>>>>>>>
>>>>>>>    PUBLIC.GATABLE2.PRODUCT_VIEWS_APP
>>>>>>>
>>>>>>>    PUBLIC.GATABLE2.PRODUCT_CLICKS_APP
>>>>>>>
>>>>>>>    PUBLIC.GATABLE2.PRODUCT_VIEWS_WEB
>>>>>>>
>>>>>>>    PUBLIC.GATABLE2.PRODUCT_CLICKS_WEB
>>>>>>>
>>>>>>>    PUBLIC.GATABLE2.PDP_SESSIONS_APP
>>>>>>>
>>>>>>>    PUBLIC.GATABLE2.PDP_SESSIONS_WEB
>>>>>>>    - pkey = customer_id,dt
>>>>>>>    - affinityKey = customer
>>>>>>>
>>>>>>> Query:
>>>>>>>
>>>>>>>    - select COUNT(*) FROM( Select customer_id from GATABLE2 where
>>>>>>>    category_id in (175925, 101450, 9005, 175930, 175930, 
>>>>>>> 175940,175945,101450,
>>>>>>>    6453) group by customer_id having SUM(product_views_app) > 2 OR
>>>>>>>    SUM(product_clicks_app) > 1 )
>>>>>>>
>>>>>>> The table has 600M rows.
>>>>>>> At first, the query took 1m, when we added an index on category_id
>>>>>>> the query started taking 3m.
>>>>>>>
>>>>>>> The SQL execution plan for both queries is attached.
>>>>>>>
>>>>>>> We are using a single x1.16xlarge insntace with query parallelism
>>>>>>> set to 32
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Eugene
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Andrey V. Mashenkov
>>>>>>
>>>>>

Reply via email to