Re: [SQL] in() VS exists()

2008-03-13 Thread Bart Degryse
So it all comes back to what I wrote in the beginning: "Which solution performs best depends on the data, the database version, the available indexes, ..." Tips: - be aware that statements can be written in different ways - test them on realistic data - use explain to tune your statements >

Re: [SQL] in() VS exists()

2008-03-13 Thread Julien
>From my experience I tend to avoid SELECT DISTINCT queries because it's usually very slow with many rows ... For my specific case the result is the same: muridae=> select count(distinct s.specimen_id) from specimens sp INNER JOIN sequences s on s.specimen_id = sp.id; count --- 1431 (1 ro

Re: [SQL] in() VS exists()

2008-03-13 Thread Bart Degryse
The chapter on indexes in the manual ( http://www.postgresql.org/docs/8.2/static/indexes.html )should give you a pretty good idea on the why. IN and EXISTS are not the only possibilities, you can also use inner or outer joins. Which solution performs best depends on the data, the database version,

Re: [SQL] in() VS exists()

2008-03-13 Thread Bart Degryse
how about select count(distinct s.specimen_id) from specimens sp INNER JOIN sequences s on s.specimen_id = sp.id; >>> Julien <[EMAIL PROTECTED]> 2008-03-13 17:27 >>> mmh no because it's a one to many relation (a specimen can have more than one sequence) : muridae=> select count(sp.id) from spec

Re: [SQL] in() VS exists()

2008-03-13 Thread Julien
mmh no because it's a one to many relation (a specimen can have more than one sequence) : muridae=> select count(sp.id) from specimens sp INNER JOIN sequences s on s.specimen_id = sp.id; count --- 1536 (1 row) Time: 81.242 ms muridae=> select count(sp.id) from specimens sp where sp.id in

Re: [SQL] in() VS exists()

2008-03-13 Thread Bart Degryse
I think that just select count(sp.id) from specimens sp INNER JOIN sequences s on s.specimen_id = sp.id; should be enough >>> Julien <[EMAIL PROTECTED]> 2008-03-13 17:10 >>> If I understood well the query plan, the planner optimize the IN(SELECT ...) version with a JOIN (line 19-20 of the first pa

Re: [SQL] in() VS exists()

2008-03-13 Thread Julien
If I understood well the query plan, the planner optimize the IN(SELECT ...) version with a JOIN (line 19-20 of the first paste) : -> Hash IN Join (cost=240.95..4011.20 rows=1436 width=4) (actual time=93.971..201.908 rows=1431 loops=1) Hash Cond: ("outer".id = "inner".specimen_id) so I

[SQL] in() VS exists()

2008-03-13 Thread Julien
Hello, Does anyone has an idea why sometimes: - select ... where ... in (select ...) is faster than : - select ... where ... exists(select ...) and sometimes it's the opposite ? I had such a situation, I've pasted the queries on: http://rafb.net/p/KXNZ6892.html and http://rafb.net/p/jvo5DO38.html