On Sat, 21 Mar 2015 19:01:16 +0100
"Mario M. Westphal" <mw at mwlabs.de> wrote:

> For now I have increased the threshold for IN clauses (instead of
> TEMP tables) and use WHERE IN (SELECT ? from TEMP) instead of a JOIN.

Because the two are conceptually different, it's not surprising they
run differently.  IN is an existence test: does the value appear in the
set?  JOIN is a product: produce every matching combination.  

For some queries, they reduce to the same thing.  For example, 

        FROM A JOIN B ON A.b = B.b

is the same as IN if B.b 

        1.  is unique, and 
        2.  no other columns from B are used.  

The query optimizer has to be sophisticated enough to recognize
those conditions, which is unlikely in the case of a temporary table.  

There is also the question of linear vs binary searches.  When you
supply a list of constants to IN, most if not all DBMSs search the list
sequentially.  When IN (or EXISTS) is supplied from an indexed column,
the search is often binary.  For a small number of elements, there's no
distinction.  For 1000 elements, it's 2 orders of magnitude: 1000 hops
versus 10.  

--jkl

Reply via email to