Optimizing Collocated Join

kellan Sun, 16 Dec 2018 20:03:06 -0800

I'm trying to understand the mechanics of collocated joins in Ignite (I'm
running 2.7). I have two queries that are only differentiated by the date
range that return in the same amount of time, although one pulls data for
thirty days and the other for only one.



    SELECT S2CellCovering.coveringId, COUNT(*)
    FROM S2CellCovering
    INNER JOIN EventTheta
      ON EventTheta.parentS2CellId = S2CellCovering.parentS2CellId
      AND EventTheta.s2CellId BETWEEN S2CellCovering.minS2CellId AND
S2CellCovering.maxS2CellId
      AND EventTheta.eventDate BETWEEN '2018-10-02' AND '2018-10-02'
      AND EventTheta.eventHour BETWEEN -1 AND -1
    WHERE S2CellCovering.coveringId = 166
    GROUP BY S2CellCovering.coveringId;

    COVERINGID  166
    COUNT(*)    5629

    COVERINGID  166
    COUNT(*)    6407

    COVERINGID  166
    COUNT(*)    9030

    COVERINGID  166
    COUNT(*)    8965

    4 rows selected (12.389 seconds)


    SELECT S2CellCovering.coveringId, COUNT(*)
    FROM S2CellCovering
    INNER JOIN EventTheta
      ON EventTheta.parentS2CellId = S2CellCovering.parentS2CellId
      AND EventTheta.s2CellId BETWEEN S2CellCovering.minS2CellId AND
S2CellCovering.maxS2CellId
      AND EventTheta.eventDate BETWEEN '2018-10-02' AND '2018-10-31'
      AND EventTheta.eventHour BETWEEN -1 AND -1
    WHERE S2CellCovering.coveringId = 166
    GROUP BY S2CellCovering.coveringId;

    COVERINGID  166
    COUNT(*)    149690

    COVERINGID  166
    COUNT(*)    161104

    COVERINGID  166
    COUNT(*)    215174

    COVERINGID  166
    COUNT(*)    218183
    
    4 rows selected (12.578 seconds)


When I run EXPLAIN it looks like my rows are all properly indexed, as far as
I can tell:


PLAN  SELECT
    __Z0.COVERINGID AS __C0_0,
    COUNT(*) AS __C0_1
FROM PUBLIC.S2CELLCOVERING __Z0
    /* PUBLIC.S2CELLCOVERING_COVERINGID_ASC_IDX: COVERINGID = 166 */
    /* WHERE __Z0.COVERINGID = 166
    */
INNER JOIN PUBLIC.EVENTTHETA __Z1
    /*
PUBLIC.EVENTTHETA_PARENTS2CELLID_ASC_S2CELLID_ASC_EVENTDATE_ASC_EVENTHOUR_ASC_IDX:
EVENTHOUR >= -1
        AND EVENTHOUR <= -1
        AND EVENTDATE >= DATE '2018-10-02'
        AND EVENTDATE <= DATE '2018-10-31'
        AND PARENTS2CELLID = __Z0.PARENTS2CELLID
        AND S2CELLID >= __Z0.MINS2CELLID
        AND S2CELLID <= __Z0.MAXS2CELLID
     */
    ON 1=1
WHERE (__Z0.COVERINGID = 166)
    AND (((__Z1.EVENTHOUR >= -1)
    AND (__Z1.EVENTHOUR <= -1))
    AND (((__Z1.EVENTDATE >= DATE '2018-10-02')
    AND (__Z1.EVENTDATE <= DATE '2018-10-31'))
    AND ((__Z1.PARENTS2CELLID = __Z0.PARENTS2CELLID)
    AND ((__Z0.MINS2CELLID <= __Z1.S2CELLID)
    AND (__Z0.MAXS2CELLID >= __Z1.S2CELLID)))))
GROUP BY __Z0.COVERINGID
/* group sorted */

PLAN  SELECT
    __C0_0 AS COVERINGID,
    __C0_1 AS __C0_1
FROM PUBLIC.__T0
    /* PUBLIC."merge_scan" */


The EVENTTHETA and S2CELLCOVERING keys should be collocated on the same
nodes. Here are my keys:


    case class EventThetaKey(
        @(AffinityKeyMapped@field)
        @(QuerySqlField@field)(
            orderedGroups = Array(
                new (QuerySqlField.Group@field)(
                    name = "pk_public_eventtheta",
                    order = 4,
                    descending = true
                )
            )
        ) parentS2CellId: Long,
        @(QuerySqlField@field)(
            orderedGroups = Array(
                new (QuerySqlField.Group@field)(
                    name = "pk_public_eventtheta",
                    order = 3,
                    descending = true
                )
            )
        ) s2CellId: Long,
        @(QuerySqlField@field)(
            orderedGroups = Array(
                new (QuerySqlField.Group@field)(
                    name = "pk_public_eventtheta",
                    order = 2,
                    descending = true
                )
            )
        ) eventDate: Date,
        @(QuerySqlField@field)(
            orderedGroups = Array(
                new (QuerySqlField.Group@field)(
                    name = "pk_public_eventtheta",
                    order = 1,
                    descending = true
                )
            )
        ) eventHour: Byte
    )


    case class S2CellCoveringKey(
        @(AffinityKeyMapped@field)
        @(QuerySqlField@field)(
            orderedGroups = Array(
                new (QuerySqlField.Group@field)(
                    name = "pk_public_s2cellcovering",
                    order = 3,
                    descending = true
                )
            )
        ) parentS2CellId: Long,
        @(QuerySqlField@field)(
            orderedGroups = Array(
                new (QuerySqlField.Group@field)(
                    name = "pk_public_s2cellcovering",
                    order = 2,
                    descending = true
                )
            )
        ) s2CellId: Long,
        @(QuerySqlField@field)(
            index = true,
            orderedGroups = Array(
                new (QuerySqlField.Group@field)(
                    name = "pk_public_s2cellcovering",
                    order = 1,
                    descending = true
                )
            )
        ) coveringId: Long
    )


Based on the fact that I am able to scan 30x more rows in roughly the same
amount of time, I gather that at one of my indices is set up properly, but
both queries are taking over 12s to run with plenty of compute available.
What am I missing here? I'm assuming that since the tables are joined on a
shared affinity key the join is collocated, but maybe I'm wrong.

On another note, is there a way to run this query in a compute task such
that I'm only returning results from the local node?







--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Optimizing Collocated Join

Reply via email to