Re: [HACKERS] limit in subquery causes poor selectivity estimation

2011-09-05 Thread Robert Haas
On Fri, Sep 2, 2011 at 12:45 PM, Tom Lane wrote: > column values).  But GROUP BY or DISTINCT would entirely invalidate the > column frequency statistics, which makes me think that ignoring the > pg_statistic entry might be the thing to do.  Comments? There's a possible problem there in that you m

Re: [HACKERS] limit in subquery causes poor selectivity estimation

2011-09-04 Thread Tom Lane
I wrote: > On a longer-term basis, I'm looking into what we could do with > extracting stats from subqueries, but that doesn't seem like material > for a backpatch. I have a draft patch that I've been playing with > (attached). I've committed a heavily rewritten version of that patch. Git HEAD s

Re: [HACKERS] limit in subquery causes poor selectivity estimation

2011-09-02 Thread Tom Lane
Peter Eisentraut writes: > On lör, 2011-08-27 at 13:32 -0400, Tom Lane wrote: >> The larger problem is that if a subquery didn't get flattened, it's >> often because it's got LIMIT, or GROUP BY, or some similar clause that >> makes it highly suspect whether the statistics available for the table

Re: [HACKERS] limit in subquery causes poor selectivity estimation

2011-08-31 Thread Tom Lane
Robert Haas writes: > On Wed, Aug 31, 2011 at 6:22 AM, Peter Eisentraut wrote: >> I liked the old one better. ;-) > AFAICS, those plans are identical, except for a minor difference in > the cost of scanning test2. The point is that the estimate of the result size is worse in 8.4.8. I am not, h

Re: [HACKERS] limit in subquery causes poor selectivity estimation

2011-08-31 Thread Robert Haas
On Wed, Aug 31, 2011 at 6:22 AM, Peter Eisentraut wrote: > On lör, 2011-08-27 at 13:32 -0400, Tom Lane wrote: >> > EXPLAIN SELECT * FROM test1  WHERE sha1 in (SELECT sha1 FROM test2 >> LIMIT 200); >> >> > Here, however, it has apparently not passed this knowledge through >> the >> > LIMIT. >> >> T

Re: [HACKERS] limit in subquery causes poor selectivity estimation

2011-08-31 Thread Peter Eisentraut
On lör, 2011-08-27 at 13:32 -0400, Tom Lane wrote: > > EXPLAIN SELECT * FROM test1 WHERE sha1 in (SELECT sha1 FROM test2 > LIMIT 200); > > > Here, however, it has apparently not passed this knowledge through > the > > LIMIT. > > The LIMIT prevents the subquery from being flattened entirely, ie w

Re: [HACKERS] limit in subquery causes poor selectivity estimation

2011-08-29 Thread Robert Haas
On Sat, Aug 27, 2011 at 1:32 PM, Tom Lane wrote: > Peter Eisentraut writes: >> EXPLAIN SELECT * FROM test1  WHERE sha1 in (SELECT sha1 FROM test2); >>                               QUERY PLAN >> -- >>  Hash Semi Join  (cost=30.52

Re: [HACKERS] limit in subquery causes poor selectivity estimation

2011-08-27 Thread Tom Lane
Peter Eisentraut writes: > EXPLAIN SELECT * FROM test1 WHERE sha1 in (SELECT sha1 FROM test2); > QUERY PLAN > -- > Hash Semi Join (cost=30.52..61.27 rows=1000 width=27) >Hash Cond: (test1.sha1

[HACKERS] limit in subquery causes poor selectivity estimation

2011-08-27 Thread Peter Eisentraut
This is an artificial test case shrunk down from a much larger real query. CREATE TABLE test1 ( sha1 bytea PRIMARY KEY, something text ); CREATE TABLE test2 ( sha1 bytea PRIMARY KEY, blah text ); Fill those with 1000 random rows each, e.g., for i in $(seq 1 1000); do sha1=$(echo