Thanks Andrey, Right now we are testing with only one big node, so the reduce step should not take any time.
1) We already set parallelism to 32, and I can still see only 1 core working. Anything else could be preventing multiple cores from working on the job? 2) Why would the reduce phase need to look at all the data? It seems like a fairly simple query 3) We are already collocating data by customer_id (though as I mentioned, right now there is only one node) 4) We already using collocation and tried using an index, and other advice? Is there a way to check what Ignite is actually doing? How are indexs used (by Ignite or H2)? Cheers, Eugene On Wed, Aug 22, 2018 at 3:54 AM, Andrey Mashenkov < andrey.mashen...@gmail.com> wrote: > Hi, > > 1. Possible there are too much data should be looked for the query. With > single node and parallelism=1 query will always run in single thread. > You can try to add more nodes or increase query parallelism to utilize > more CPU cores. > > 2. Index on date field may be not effective as reduce phase should look > all the data for further grouping. > Try add index on customer_id or use collocation in customer_id (usually > more preferable way). > > Also it is possible the bottleneck is the reduce phase. > Is it possible to collocate data by group by column (customer_id)? This > collocation will allow you use collocated flag [1] and Ignite will use more > optimal plan. > > 4. The main techniques is trying to reduce amount to data to be looked up > on every phase with using data collocation and indices > Ignite provide 2 plans for distributed queries: map and reduce. You can > analyse and check these queries separately to understand how much data are > processed on map phase and on reduce. > Map query process node local data (until distributed joins on), while > reduce fetch data from remote node that may costs. . > > > On Wed, Aug 22, 2018 at 6:07 AM eugene miretsky <eugene.miret...@gmail.com> > wrote: > >> Here is the result of EXPLAIN for the afermantioned query: >> >> SELECT >> __Z0.CUSTOMER_ID AS __C0_0, >> SUM(__Z0.EC1_BKNT_TOTAL_PRODUCT_VIEWS_APP) AS __C0_1, >> MAX(__Z0.EC1_HNK_TOTAL_PRODUCT_CLICKS_APP) AS __C0_2 >> FROM PUBLIC.GAL3EC1 __Z0 >> /* PUBLIC.DT_IDX: DT > '2018-05-12' */ >> WHERE __Z0.DT > '2018-05-12' >> GROUP BY __Z0.CUSTOMER_ID >> SELECT >> COUNT(*) >> FROM ( >> SELECT >> __C0_0 AS CUSTOMER_ID >> FROM PUBLIC.__T0 >> GROUP BY __C0_0 >> HAVING (SUM(__C0_1) > 2) >> AND (MAX(__C0_2) < 1) >> ) _0__Z1 >> >> >> >> On Tue, Aug 21, 2018 at 8:18 PM, eugene miretsky < >> eugene.miret...@gmail.com> wrote: >> >>> Hi, >>> >>> We have a cache called GAL3EC1, it has >>> >>> 1. A composite pKey consisting of customer_id and date >>> 2. An Index on the date column >>> 3. 300 sparse columns >>> >>> We are running a single EC2 4x8xlarge node. >>> >>> The following query takes 8min to finish >>> Select COUNT (*) FROM (SELECT customer_id FROM GAl3ec1 where dt > >>> '2018-05-12' GROUP BY customer_id having >>> SUM(ec1_bknt_total_product_views_app) >>> > 2 AND MAX(ec1_hnk_total_product_clicks_app) < 1) >>> >>> I have a few questions: >>> >>> 1. 'top' command shows %100 cpu utilization (i.e only one of the 32 >>> CPUs is used). How can I get the query to use all 32 CPUs? I have tried >>> setting Query Parallelism to 32, but it didn't help, >>> 2. Adding the index on date column seems to have slowed down the >>> query. The 8min time from above was without the index, with the index the >>> query doesn't finish (I gave up after 30min). A similar query on a >>> smaller date range showed a 10x slow down with the index. Why? >>> 3. Our loads from Spark are very slow as well, and also seem to not >>> use the system resource properly, can that be related? >>> 4. What are some good tools and techniques to troubleshoot these >>> problems in Ignite? >>> >>> >>> All the relevant info is attached (configs, cache stats, node stats, >>> etc.). >>> >>> Cheers, >>> Eugene >>> >>> >>> >>> >>> >>> >>> >> > > -- > Best regards, > Andrey V. Mashenkov >