Hi,

Unfortunately I don't understand the root of the problem totally.
Looks like the performance depends linear on rows count:
10k ~ 0.1s
65k ~ 0.65s
650k ~ 7s
I see the linear dependency on rows count...

> Is Ignite doing the join and filtering at each data node and then sending
> the 650K total rows to the reduce before aggregation?

Which aggregation do you mean?
Please provide the query plan and data schema for details.

On 24.04.2021 3:24, William.L wrote:
Hi,

I am trying to understand why my colocated join between two tables/caches
are taking so long compare to the individual table filters.

----TABLE1

Returns 10000 count -- 0.13s

----TABLE2

Returns 65000 count -- 0.643s


---- JOIN TABLE1 and TABLE2

Returns 650K count -- 7s

Both analysis_input and analysis_output has index on (cohort_id, user_id,
timestamp). The affinity key is user_id. How do I analyze the performance
further?

Here's the explain which does not tell me much:



Is Ignite doing the join and filtering at each data node and then sending
the 650K total rows to the reduce before aggregation? If so, is it possible
for Ignite to do the some aggregation at the data node first and then send
the first level aggregation results to the reducer?






--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

--
Taras Ledkov
Mail-To: tled...@gridgain.com

Reply via email to