DISTINCT forces the query optimizer to create an intermediate table to hold the 
results and compare each row of the non-distinct result set with an 
automatically created index. It may also affect the query plan in a way that 
chooses inefficient indices, which is more likely if you have not run ANALYZE 
on the fully loaded database.

Using a 3 stage pipe instead you additionally have more CPUs (1 running the 
query, 1 or more sorting the results) working in paralell.

Try EXPLAIN QUERY PLAN to see what the query planner is doing.

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Kevin O'Gorman
Gesendet: Donnerstag, 02. Februar 2017 03:28
An: sqlite-users <sqlite-users@mailinglists.sqlite.org>
Betreff: [sqlite] "DISTINCT" makes a query take 37 times as long

I have a database of positions and moves in a strategic game, and I'm searching 
for unsolved positions that have been connected to an immediate ancestor.  I'm 
using Python 3.5.2, and the code looks like

#!/usr/bin/env python3
"""Output positions that are reachable but unsolved at census 18 or greater See 
page 76 of Qubic log

Last Modified: Tue Jan 31 12:13:07 PST 2017 """

import sqlite3          # https://docs.python.org/3.5/library/sqlite3.html

with sqlite3.connect("917.db") as conn:
    for row in conn.execute("""
                SELECT DISTINCT ppos
                FROM move JOIN pos ON mto = pnum
                WHERE pcensus = 18 and pmin < pmax
            """):
        print(row[0])

As written here, this query runs for 1193 minutes (just short of 20 hours).  If 
I remove the "DISTINCT" and instead pipe the result into the sort program that 
comes with Linux "sort --unique" the query and sort takes only 31 minutes.  The 
results are the same, and consist of 4.2 million rows.

This seems extreme.

--
word of the year: *kakistocracy*
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to