Re: [sqlite] JOIN vs. INTERSECT speed and efficiency

Keith Medcalf Thu, 07 Sep 2017 14:42:10 -0700

>Other "am I right in thinking this" question: INTERSECT is only going
>to be viable when comparing full records, correct? If you're looking
>to filter table A by whether its primary key is also a primary key
>for table B, but ignoring the other fields in both, then INTERSECT
>becomes not an option, or at least starts making the query more
>complex/ugly... correct?

No.  The data must merely be the same shape (order, number of columns):

SELECT c1, c2, c3, c8 from t1
INTERSECT
SELECT x5, g4, q7, b3 from t2;

each of t1 and t2 may have fields named a1 a2 a4 ... a26 b1 b2 b3 ... b26 ... 
z1 z2 z3 ... z26 (for a total of 676 fields per table).

The comparison is of resulting matrices, not underlying tables.  Of course, if 
you do "SELECT * from table1;" the * is merely "syntactic sugar" (a short way 
of spelling) the complete, in declaration order, list of explicit columns.  
(So, if and only if the ROWID is an "explicit column" is it used in the 
intersection, otherwise it is not -- and concomitantly if you explicitly list 
the columns to intersect, then the rowid participates if and only if you have 
included it in the list of data to intersect.)

INTERSECT / EXCEPT / UNION are matrix operations.  The RHS (select statement 
before the operator) and LHS (select statement after the operator) provide the 
two matrices on which the operation is performed.  Matrixes do not have column 
names, merely ordinal positions (column 1, column 2, column 3 and so forth).  
Similarly these operations do not care about column names, merely that the 
order (number of columns in each matrix) is the same.  Comparisons are done by 
ordinal position of the item in the row.  The output is a matrix.  It only 
appears "row at a time" because of the primitive method of operation of (no 
matter how advanced) computers which can only perform operations as a serial 
sequence of steps.  For "user convenience" the output column names are set to 
the RHS column names.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of David Raymond
>Sent: Thursday, 7 September, 2017 10:31
>To: SQLite mailing list
>Subject: Re: [sqlite] JOIN vs. INTERSECT speed and efficiency
>
>Although it may not translate as well to the more complex examples,
>would you also consider adding the IN operator to your tests? I found
>for example that "select v from t1 where v in t2;" did even better
>than the join or the intersect.
>
>
>
>
>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of R Smith
>Sent: Thursday, September 07, 2017 8:06 AM
>To: sqlite-users@mailinglists.sqlite.org
>Subject: Re: [sqlite] JOIN vs. INTERSECT speed and efficiency
>
>
>On 2017/09/07 3:43 AM, Keith Medcalf wrote:
>> Try the same test using 147 columns in each table.
>
>Exactly the plan for this weekend :)
>
>> 1 column is rather trivial.  Even a kindergarten kid could do it in
>no time using crayons and the wall.
>
>So? That is non-sequitur, I am sure given enough crayons, wall-space
>and
>time, a kindergarten kid can do it with 147 columns too. That says
>exactly nothing about the possible efficiencies of different methods.
>If
>however the 1-columness of the test gets somehow advantaged by being
>the
>PK (as Nico pointed out) or real world data such as TEXT entries sort
>slower than INTs, then it might affect it, so the 147 column tests
>will
>tell.
>
>
>> In other words except in very trivial cases (like having only one
>column that is not nullable) it will be very difficult to write a
>"correct" JOIN or correlated subquery that emulates an INTERSECT.
>
>Well I agree, but it is those trivial cases that are of interest
>here,
>and if there is a general JOIN optimization to be had. The INTERSECT
>test merely served as the catalyst to put us on the trail of the
>possible JOIN optimization, if there is even an optimization to be
>had
>(it might yet be a wild goose chase, which you seem to have your
>money
>on, so watch this space, I'll graciously accept your "told ya!" later
>after testing).
>
>
>Cheers,
>Ryan
>
>_______________________________________________
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>_______________________________________________
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] JOIN vs. INTERSECT speed and efficiency

Reply via email to