Hi,

I am seeing poor performance on queries which use IN in the WHERE clause
with multiple items supplied. I have searched for issues related to this and
I have seen replies suggesting that when you use IN, that indexes are not
used, but I guess this must have been fixed because I do see indexes being
used in the query plan (but the query is still slow). I've seen suggestions
to re-write IN as a JOIN but I've raised a similar issue where I am seeing
that Indexes aren't used on joins (unless the query is re-written in such a
way that makes it prohibitively slow).  
What I am observing is that on a table with roughly 300 million entries in
it, if I use the IN clause, the query takes a few seconds, where as if I
re-write it is a UNION with each query filtering on only one item, it
returns in milliseconds. The issue is the user could be filtering on lots of
items and on multiple attributes, so if I do this I will end up with a huge
number of unions.

Example 1 - using IN clause, filtering on two items. Returns slowly (about 8
seconds).


 SELECT product_name, location_name,

SUM(revenue)

FROM FactTableRevenue

WHERE date_key = 20200604

AND product_name IN ('Product1', 'Product2')

GROUP BY product_name, location_name

 Query plan:
SELECT

    __Z0.PRODUCT_NAME AS __C0_0,

    __Z0.LOCATION_NAME AS __C0_1,

    SUM(__Z0.REVENUE) AS __C0_2

FROM PUBLIC.FACTTABLEREVENUE __Z0

    /* PUBLIC.IX_REVENUE_DATE_PRODUCT_LOCATION: PRODUCT_NAME IN('Product1',
'Product2')

       AND DATE_KEY = 20200604

     */

WHERE (__Z0.PRODUCT_NAME IN('Product1', 'Product2'))

    AND (__Z0.DATE_KEY = 20200604)

GROUP BY __Z0.PRODUCT_NAME, __Z0.LOCATION_NAME

 
Example 2 - Re-written as a union. Returns very quickly (50 ms)

SELECT product_name, location_name,

SUM(revenue)

FROM FactTableRevenue

WHERE date_key = 20200604

AND product_name IN ('Product1')

GROUP BY product_name, location_name

 

UNION ALL

 

SELECT product_name, location_name,

SUM(revenue)

FROM FactTableRevenue

WHERE date_key = 20200604

AND product_name IN ('Product2')

GROUP BY product_name, location_name

 

SELECT

    __Z0.PRODUCT_NAME AS __C0_0,

    __Z0.LOCATION_NAME AS __C0_1,

    SUM(__Z0.REVENUE) AS __C0_2

FROM PUBLIC.FACTTABLEREVENUE __Z0

    /* PUBLIC.IX_REVENUE_DATE_PRODUCT_LOCATION: DATE_KEY = 20200604

        AND PRODUCT_NAME = 'Product1'

     */

WHERE (__Z0.DATE_KEY = 20200604)

    AND (__Z0.PRODUCT_NAME = 'Product1')

GROUP BY __Z0.PRODUCT_NAME, __Z0.LOCATION_NAME



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to