Hi, Unfortunately I don't understand the root of the problem totally. Looks like the performance depends linear on rows count: 10k ~ 0.1s 65k ~ 0.65s 650k ~ 7s I see the linear dependency on rows count...
> Is Ignite doing the join and filtering at each data node and then sending
> the 650K total rows to the reduce before aggregation? Which aggregation do you mean? Please provide the query plan and data schema for details. On 24.04.2021 3:24, William.L wrote:
Hi, I am trying to understand why my colocated join between two tables/caches are taking so long compare to the individual table filters. ----TABLE1 Returns 10000 count -- 0.13s ----TABLE2 Returns 65000 count -- 0.643s ---- JOIN TABLE1 and TABLE2 Returns 650K count -- 7s Both analysis_input and analysis_output has index on (cohort_id, user_id, timestamp). The affinity key is user_id. How do I analyze the performance further? Here's the explain which does not tell me much: Is Ignite doing the join and filtering at each data node and then sending the 650K total rows to the reduce before aggregation? If so, is it possible for Ignite to do the some aggregation at the data node first and then send the first level aggregation results to the reducer? -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
-- Taras Ledkov Mail-To: tled...@gridgain.com