Hi, I am seeing poor performance on queries which use IN in the WHERE clause with multiple items supplied. I have searched for issues related to this and I have seen replies suggesting that when you use IN, that indexes are not used, but I guess this must have been fixed because I do see indexes being used in the query plan (but the query is still slow). I've seen suggestions to re-write IN as a JOIN but I've raised a similar issue where I am seeing that Indexes aren't used on joins (unless the query is re-written in such a way that makes it prohibitively slow). What I am observing is that on a table with roughly 300 million entries in it, if I use the IN clause, the query takes a few seconds, where as if I re-write it is a UNION with each query filtering on only one item, it returns in milliseconds. The issue is the user could be filtering on lots of items and on multiple attributes, so if I do this I will end up with a huge number of unions.
Example 1 - using IN clause, filtering on two items. Returns slowly (about 8 seconds). SELECT product_name, location_name, SUM(revenue) FROM FactTableRevenue WHERE date_key = 20200604 AND product_name IN ('Product1', 'Product2') GROUP BY product_name, location_name Query plan: SELECT __Z0.PRODUCT_NAME AS __C0_0, __Z0.LOCATION_NAME AS __C0_1, SUM(__Z0.REVENUE) AS __C0_2 FROM PUBLIC.FACTTABLEREVENUE __Z0 /* PUBLIC.IX_REVENUE_DATE_PRODUCT_LOCATION: PRODUCT_NAME IN('Product1', 'Product2') AND DATE_KEY = 20200604 */ WHERE (__Z0.PRODUCT_NAME IN('Product1', 'Product2')) AND (__Z0.DATE_KEY = 20200604) GROUP BY __Z0.PRODUCT_NAME, __Z0.LOCATION_NAME Example 2 - Re-written as a union. Returns very quickly (50 ms) SELECT product_name, location_name, SUM(revenue) FROM FactTableRevenue WHERE date_key = 20200604 AND product_name IN ('Product1') GROUP BY product_name, location_name UNION ALL SELECT product_name, location_name, SUM(revenue) FROM FactTableRevenue WHERE date_key = 20200604 AND product_name IN ('Product2') GROUP BY product_name, location_name SELECT __Z0.PRODUCT_NAME AS __C0_0, __Z0.LOCATION_NAME AS __C0_1, SUM(__Z0.REVENUE) AS __C0_2 FROM PUBLIC.FACTTABLEREVENUE __Z0 /* PUBLIC.IX_REVENUE_DATE_PRODUCT_LOCATION: DATE_KEY = 20200604 AND PRODUCT_NAME = 'Product1' */ WHERE (__Z0.DATE_KEY = 20200604) AND (__Z0.PRODUCT_NAME = 'Product1') GROUP BY __Z0.PRODUCT_NAME, __Z0.LOCATION_NAME -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/