Good afternoon,
I am facing a strange performance issue when doing SQL queries on my
cluster.
Here is the scenario (I cannot use real config etc because this source code
is protected):
I have 3 caches (a subset of a STAR schema).
CACHE1 is the fact table: EVENTS. It is a partitioned cache. It has an
affinityKey on USER ID.
CACHE2 is the user table: USERS. It is a partitioned cache. It has an
affinityKey on USER ID as well.
CACHE3 is the document table: DOCS. It is a replicated cache.
I also have the following config:
Every event from the event table has one USER ID and one DOCUMENT ID.
All the columns are indexed.
I run the query with setLocal(true) and setEnforceJoinOrder(true).
I would like to do the following:
1. SELECT *
2. FROM EVENTS
3. INNER JOIN USERS ON EVENTS.USER_ID = USERS.USER_ID
4. INNER JOIN DOCS ON EVENTS.DOC_ID = DOCS.DOC_ID
5. WHERE DOCS.COL1 = 'some filter'
6. AND USERS.COL2 = 'some other filter'
Here is what I observe:
When I run the query without line 5 (filter on documents), it is instant.
When I run the query with line 5 (both filters), it is 20X slower. (Even
though those filters are on indexed columns)- The EXPLAIN form the logs
indicates lots of scans.
If I run the query as so (syntax is not exact but the idea is there):
SELECT *
FROM (
SELECT USERS.*, DOCS.*
FROM EVENTS
INNER JOIN USERS ON EVENTS.USER_ID = USERS.USER_ID
INNER JOIN DOCS ON EVENTS.DOC_ID = DOCS.DOC_ID
WHERE DOCS.COL1 = 'some filter'
)
WHERE USERS.COL2 = 'some other filter'
I have the expected performances again.
It seems that the index on COL2 is being ignored when doing two joins and a
filter on each table.
What do you think about it ?
Thank yo very much !
Kind regards
Emmanuel
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/