[sqlite] SELECT becomes very slow when converted to UPDATE

Jonathan Moules Sat, 23 Jun 2018 06:44:37 -0700

Hi List,

I'm trying to find all hashes that are unique to a specific id (my_id),and then use a UPDATE-Join to update another table with that number.

After much tweaking, I've simplified the table down to a basic temptable (actually created using a CREATE AS SELECT ... GROUP BY my_id, hash):


CREATE TEMP TABLE t_distinct_hashes_by_id (
    my_id            INTEGER,
    hash            TEXT
);

And indexes in both directions because I'm still trying to optimise (andsee what SQLite wants):


CREATE UNIQUE INDEX temp.idx__1 ON t_distinct_hashes_by_id (
    hash,
     my_id
);

CREATE UNIQUE INDEX temp.idx__2 ON t_distinct_hashes_by_id (
        my_id,
        hash
);

There are only 20 values for my_id, but several hundred thousand hashes.

-----

I can do a SELECT query which gets what I want and runs in about 0.5seconds:


SELECT
   *
        FROM
            temp.t_distinct_hashes_by_id d
        WHERE

hash NOT IN (SELECT hash FROM temp.t_distinct_hashes_by_idsub where 1 != sub.my_id and hash not NULL)

            AND
            1 = d.my_id

The EXPLAIN:

0 0 0 SEARCH TABLE t_distinct_hashes_by_id AS l USING COVERINGINDEX idx__2 (my_id=?)

0    0    0    EXECUTE LIST SUBQUERY 1
1    0    0    SCAN TABLE t_distinct_hashes_by_id AS sub

-----

So in theory, I'd hope that an UPDATE version using this select shouldtake around 20 * 0.5 sec = 10 seconds. But it's actually taking... wellI don't know how long, at least 10 minutes before I gave up waiting.This is the UPDATE:


UPDATE
    meta_table
SET
    distinct_hashes = (
        SELECT
            COUNT(hash) AS num
        FROM
            temp.t_distinct_hashes_by_id d
        WHERE

hash NOT IN (SELECT hash FROM temp.t_distinct_hashes_by_idsub where meta_table.my_id != sub.my_id and hash not NULL)

            AND
            -- This one has to be at the bottom for some reason.
            meta_table.my_id = d.my_id
    )
;

The query plan for this UPDATE includes two CORRELATED Subqueries, whichthe docs say are reassessed on every run - that seems like the problem.I get that it'd need to do that 20 times (once per my_id), but theslowdown seems considerably longer than that needs to warrant.


0    0    0    SCAN TABLE meta_table
0    0    0    EXECUTE CORRELATED SCALAR SUBQUERY 0

0 0 0 SEARCH TABLE t_distinct_hashes_by_id AS l USING COVERINGINDEX idx__2 (my_id=?)

0    0    0    EXECUTE CORRELATED LIST SUBQUERY 1
1    0    0    SCAN TABLE t_distinct_hashes_by_id AS sub

Does anyone have any thoughts on how I can speed this up (SQLite 3.15.0and confirmed in 3.24.0 (which uses about 3 times the disk IO / sec forthe same query/data))?

Thanks,
Jonathan



_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] SELECT becomes very slow when converted to UPDATE

Reply via email to