I'm trying to understand the mechanics of collocated joins in Ignite (I'm
running 2.7). I have two queries that are only differentiated by the date
range that return in the same amount of time, although one pulls data for
thirty days and the other for only one.
SELECT S2CellCovering.coveringId, COUNT(*)
FROM S2CellCovering
INNER JOIN EventTheta
ON EventTheta.parentS2CellId = S2CellCovering.parentS2CellId
AND EventTheta.s2CellId BETWEEN S2CellCovering.minS2CellId AND
S2CellCovering.maxS2CellId
AND EventTheta.eventDate BETWEEN '2018-10-02' AND '2018-10-02'
AND EventTheta.eventHour BETWEEN -1 AND -1
WHERE S2CellCovering.coveringId = 166
GROUP BY S2CellCovering.coveringId;
COVERINGID 166
COUNT(*) 5629
COVERINGID 166
COUNT(*) 6407
COVERINGID 166
COUNT(*) 9030
COVERINGID 166
COUNT(*) 8965
4 rows selected (12.389 seconds)
SELECT S2CellCovering.coveringId, COUNT(*)
FROM S2CellCovering
INNER JOIN EventTheta
ON EventTheta.parentS2CellId = S2CellCovering.parentS2CellId
AND EventTheta.s2CellId BETWEEN S2CellCovering.minS2CellId AND
S2CellCovering.maxS2CellId
AND EventTheta.eventDate BETWEEN '2018-10-02' AND '2018-10-31'
AND EventTheta.eventHour BETWEEN -1 AND -1
WHERE S2CellCovering.coveringId = 166
GROUP BY S2CellCovering.coveringId;
COVERINGID 166
COUNT(*) 149690
COVERINGID 166
COUNT(*) 161104
COVERINGID 166
COUNT(*) 215174
COVERINGID 166
COUNT(*) 218183
4 rows selected (12.578 seconds)
When I run EXPLAIN it looks like my rows are all properly indexed, as far as
I can tell:
PLAN SELECT
__Z0.COVERINGID AS __C0_0,
COUNT(*) AS __C0_1
FROM PUBLIC.S2CELLCOVERING __Z0
/* PUBLIC.S2CELLCOVERING_COVERINGID_ASC_IDX: COVERINGID = 166 */
/* WHERE __Z0.COVERINGID = 166
*/
INNER JOIN PUBLIC.EVENTTHETA __Z1
/*
PUBLIC.EVENTTHETA_PARENTS2CELLID_ASC_S2CELLID_ASC_EVENTDATE_ASC_EVENTHOUR_ASC_IDX:
EVENTHOUR >= -1
AND EVENTHOUR <= -1
AND EVENTDATE >= DATE '2018-10-02'
AND EVENTDATE <= DATE '2018-10-31'
AND PARENTS2CELLID = __Z0.PARENTS2CELLID
AND S2CELLID >= __Z0.MINS2CELLID
AND S2CELLID <= __Z0.MAXS2CELLID
*/
ON 1=1
WHERE (__Z0.COVERINGID = 166)
AND (((__Z1.EVENTHOUR >= -1)
AND (__Z1.EVENTHOUR <= -1))
AND (((__Z1.EVENTDATE >= DATE '2018-10-02')
AND (__Z1.EVENTDATE <= DATE '2018-10-31'))
AND ((__Z1.PARENTS2CELLID = __Z0.PARENTS2CELLID)
AND ((__Z0.MINS2CELLID <= __Z1.S2CELLID)
AND (__Z0.MAXS2CELLID >= __Z1.S2CELLID)))))
GROUP BY __Z0.COVERINGID
/* group sorted */
PLAN SELECT
__C0_0 AS COVERINGID,
__C0_1 AS __C0_1
FROM PUBLIC.__T0
/* PUBLIC."merge_scan" */
The EVENTTHETA and S2CELLCOVERING keys should be collocated on the same
nodes. Here are my keys:
case class EventThetaKey(
@(AffinityKeyMapped@field)
@(QuerySqlField@field)(
orderedGroups = Array(
new (QuerySqlField.Group@field)(
name = "pk_public_eventtheta",
order = 4,
descending = true
)
)
) parentS2CellId: Long,
@(QuerySqlField@field)(
orderedGroups = Array(
new (QuerySqlField.Group@field)(
name = "pk_public_eventtheta",
order = 3,
descending = true
)
)
) s2CellId: Long,
@(QuerySqlField@field)(
orderedGroups = Array(
new (QuerySqlField.Group@field)(
name = "pk_public_eventtheta",
order = 2,
descending = true
)
)
) eventDate: Date,
@(QuerySqlField@field)(
orderedGroups = Array(
new (QuerySqlField.Group@field)(
name = "pk_public_eventtheta",
order = 1,
descending = true
)
)
) eventHour: Byte
)
case class S2CellCoveringKey(
@(AffinityKeyMapped@field)
@(QuerySqlField@field)(
orderedGroups = Array(
new (QuerySqlField.Group@field)(
name = "pk_public_s2cellcovering",
order = 3,
descending = true
)
)
) parentS2CellId: Long,
@(QuerySqlField@field)(
orderedGroups = Array(
new (QuerySqlField.Group@field)(
name = "pk_public_s2cellcovering",
order = 2,
descending = true
)
)
) s2CellId: Long,
@(QuerySqlField@field)(
index = true,
orderedGroups = Array(
new (QuerySqlField.Group@field)(
name = "pk_public_s2cellcovering",
order = 1,
descending = true
)
)
) coveringId: Long
)
Based on the fact that I am able to scan 30x more rows in roughly the same
amount of time, I gather that at one of my indices is set up properly, but
both queries are taking over 12s to run with plenty of compute available.
What am I missing here? I'm assuming that since the tables are joined on a
shared affinity key the join is collocated, but maybe I'm wrong.
On another note, is there a way to run this query in a compute task such
that I'm only returning results from the local node?
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/