So it all comes back to what I wrote in the beginning:
"Which solution performs best depends on the data, the database
version, the available indexes, ..."
Tips:
- be aware that statements can be written in different ways
- test them on realistic data
- use explain to tune your statements
>
>From my experience I tend to avoid SELECT DISTINCT queries because it's
usually very slow with many rows ...
For my specific case the result is the same:
muridae=> select count(distinct s.specimen_id) from specimens sp INNER
JOIN sequences s on s.specimen_id = sp.id;
count
---
1431
(1 ro
The chapter on indexes in the manual (
http://www.postgresql.org/docs/8.2/static/indexes.html )should give
you a pretty good idea on the why.
IN and EXISTS are not the only possibilities, you can also use inner or
outer joins.
Which solution performs best depends on the data, the database version,
how about
select count(distinct s.specimen_id) from specimens sp INNER JOIN
sequences s
on s.specimen_id = sp.id;
>>> Julien <[EMAIL PROTECTED]> 2008-03-13 17:27 >>>
mmh no because it's a one to many relation (a specimen can have more
than one sequence) :
muridae=> select count(sp.id) from spec
mmh no because it's a one to many relation (a specimen can have more
than one sequence) :
muridae=> select count(sp.id) from specimens sp INNER JOIN sequences s
on s.specimen_id = sp.id;
count
---
1536
(1 row)
Time: 81.242 ms
muridae=> select count(sp.id) from specimens sp where sp.id in
I think that just
select count(sp.id) from specimens sp INNER JOIN sequences s on
s.specimen_id = sp.id;
should be enough
>>> Julien <[EMAIL PROTECTED]> 2008-03-13 17:10 >>>
If I understood well the query plan, the planner optimize the
IN(SELECT ...) version with a JOIN (line 19-20 of the first pa
If I understood well the query plan, the planner optimize the
IN(SELECT ...) version with a JOIN (line 19-20 of the first paste) :
-> Hash IN Join (cost=240.95..4011.20 rows=1436 width=4) (actual
time=93.971..201.908 rows=1431 loops=1)
Hash Cond: ("outer".id = "inner".specimen_id)
so I
Hello,
Does anyone has an idea why sometimes:
- select ... where ... in (select ...)
is faster than :
- select ... where ... exists(select ...)
and sometimes it's the opposite ?
I had such a situation, I've pasted the queries on:
http://rafb.net/p/KXNZ6892.html and http://rafb.net/p/jvo5DO38.html