Hi,
I think you can try to investigate the articles from the next wiki:
https://cwiki.apache.org/confluence/display/IGNITE/Design+Documents
Next blog contains the interesting information (possible some will be out of
date):
http://gridgain.blogspot.com
It contains a lot of information about
Thanks,
I tried the composite key and provided the index tip in the query but the
query plan and execution time stayed the same. So really I am back to where
we started.
Right now I suspect that I am not setting the affinity key properly - I
intend the key to be customer_id, but maybe Ignite
1. /* PUBLIC.AFFINITY_KEY */ means index on affinity column is used. Full
index will be scanned against date condition.
As I wrote you can create composite index to speedup index scan.
2. "group sorted" means index is used for grouping. Looks like H2 have
optimization for this and grouping can
Just as a reference, bellow are 2 execution plans with and without the
index on a very similar table.
Adding the index remove /* PUBLIC.AFFINITY_KEY */ and /* group sorted */.
1) Does PUBLIC.AFFINITY_KEY mean that DT is the affinity key. We are
setting customer_id as an affinity key. Is there a
Thanks Andrey,
We are using the Ignite notebook, any idea if there is a way to provide
these flags and hints directly from SQL?
>From your description, it seems like the query is executed in the following
order
1) Group by customer_id
2) For each group, perform the filtering on date using the
Eugene,
1. Note that queryParallelism splits indices and Ignite work similar way as
if index data resides on several nodes. These index part can be looked up
in parallel threads.
2. It is not a simple query as you data distributed among partitions and is
not collocated and aggregate function are
Thanks Andrey,
Right now we are testing with only one big node, so the reduce step should
not take any time.
1) We already set parallelism to 32, and I can still see only 1 core
working. Anything else could be preventing multiple cores from working on
the job?
2) Why would the reduce phase need
Hi,
1. Possible there are too much data should be looked for the query. With
single node and parallelism=1 query will always run in single thread.
You can try to add more nodes or increase query parallelism to utilize
more CPU cores.
2. Index on date field may be not effective as reduce phase
Here is the result of EXPLAIN for the afermantioned query:
SELECT
__Z0.CUSTOMER_ID AS __C0_0,
SUM(__Z0.EC1_BKNT_TOTAL_PRODUCT_VIEWS_APP) AS __C0_1,
MAX(__Z0.EC1_HNK_TOTAL_PRODUCT_CLICKS_APP) AS __C0_2
FROM PUBLIC.GAL3EC1 __Z0
/* PUBLIC.DT_IDX: DT > '2018-05-12' */
WHERE __Z0.DT > '2018-05-12'
Hi,
We have a cache called GAL3EC1, it has
1. A composite pKey consisting of customer_id and date
2. An Index on the date column
3. 300 sparse columns
We are running a single EC2 4x8xlarge node.
The following query takes 8min to finish
Select COUNT (*) FROM (SELECT customer_id FROM
10 matches
Mail list logo