Re: Slow SQL query uses only a single CPU

2018-09-19 Thread aealexsandrov
Hi, I think you can try to investigate the articles from the next wiki: https://cwiki.apache.org/confluence/display/IGNITE/Design+Documents Next blog contains the interesting information (possible some will be out of date): http://gridgain.blogspot.com It contains a lot of information about

Re: Slow SQL query uses only a single CPU

2018-08-22 Thread eugene miretsky
Thanks, I tried the composite key and provided the index tip in the query but the query plan and execution time stayed the same. So really I am back to where we started. Right now I suspect that I am not setting the affinity key properly - I intend the key to be customer_id, but maybe Ignite

Re: Slow SQL query uses only a single CPU

2018-08-22 Thread Andrey Mashenkov
1. /* PUBLIC.AFFINITY_KEY */ means index on affinity column is used. Full index will be scanned against date condition. As I wrote you can create composite index to speedup index scan. 2. "group sorted" means index is used for grouping. Looks like H2 have optimization for this and grouping can

Re: Slow SQL query uses only a single CPU

2018-08-22 Thread eugene miretsky
Just as a reference, bellow are 2 execution plans with and without the index on a very similar table. Adding the index remove /* PUBLIC.AFFINITY_KEY */ and /* group sorted */. 1) Does PUBLIC.AFFINITY_KEY mean that DT is the affinity key. We are setting customer_id as an affinity key. Is there a

Re: Slow SQL query uses only a single CPU

2018-08-22 Thread eugene miretsky
Thanks Andrey, We are using the Ignite notebook, any idea if there is a way to provide these flags and hints directly from SQL? >From your description, it seems like the query is executed in the following order 1) Group by customer_id 2) For each group, perform the filtering on date using the

Re: Slow SQL query uses only a single CPU

2018-08-22 Thread Andrey Mashenkov
Eugene, 1. Note that queryParallelism splits indices and Ignite work similar way as if index data resides on several nodes. These index part can be looked up in parallel threads. 2. It is not a simple query as you data distributed among partitions and is not collocated and aggregate function are

Re: Slow SQL query uses only a single CPU

2018-08-22 Thread eugene miretsky
Thanks Andrey, Right now we are testing with only one big node, so the reduce step should not take any time. 1) We already set parallelism to 32, and I can still see only 1 core working. Anything else could be preventing multiple cores from working on the job? 2) Why would the reduce phase need

Re: Slow SQL query uses only a single CPU

2018-08-22 Thread Andrey Mashenkov
Hi, 1. Possible there are too much data should be looked for the query. With single node and parallelism=1 query will always run in single thread. You can try to add more nodes or increase query parallelism to utilize more CPU cores. 2. Index on date field may be not effective as reduce phase

Re: Slow SQL query uses only a single CPU

2018-08-21 Thread eugene miretsky
Here is the result of EXPLAIN for the afermantioned query: SELECT __Z0.CUSTOMER_ID AS __C0_0, SUM(__Z0.EC1_BKNT_TOTAL_PRODUCT_VIEWS_APP) AS __C0_1, MAX(__Z0.EC1_HNK_TOTAL_PRODUCT_CLICKS_APP) AS __C0_2 FROM PUBLIC.GAL3EC1 __Z0 /* PUBLIC.DT_IDX: DT > '2018-05-12' */ WHERE __Z0.DT > '2018-05-12'

Slow SQL query uses only a single CPU

2018-08-21 Thread eugene miretsky
Hi, We have a cache called GAL3EC1, it has 1. A composite pKey consisting of customer_id and date 2. An Index on the date column 3. 300 sparse columns We are running a single EC2 4x8xlarge node. The following query takes 8min to finish Select COUNT (*) FROM (SELECT customer_id FROM