Re: [HACKERS] <> join selectivity estimate question

2017-09-13 Thread Ashutosh Bapat
On Thu, Sep 14, 2017 at 4:30 AM, Tom Lane wrote: > Thomas Munro writes: >> On Wed, Sep 6, 2017 at 11:14 PM, Ashutosh Bapat >> wrote: >>> I added some "stable" tests to your patch taking inspiration from the >>> test SQL file. I think those will be stable across machines and runs. >>> Please let

Re: [HACKERS] <> join selectivity estimate question

2017-09-13 Thread Ashutosh Bapat
On Thu, Sep 14, 2017 at 4:19 AM, Thomas Munro wrote: > On Wed, Sep 6, 2017 at 11:14 PM, Ashutosh Bapat > wrote: >> On Fri, Jul 21, 2017 at 4:10 AM, Thomas Munro >> wrote: >>> That just leaves the question of whether we should try to handle the >>> empty RHS and single-value RHS cases using stati

Re: [HACKERS] <> join selectivity estimate question

2017-09-13 Thread Tom Lane
Thomas Munro writes: > On Wed, Sep 6, 2017 at 11:14 PM, Ashutosh Bapat > wrote: >> I added some "stable" tests to your patch taking inspiration from the >> test SQL file. I think those will be stable across machines and runs. >> Please let me know if those look good to you. > Hmm. But they show

Re: [HACKERS] <> join selectivity estimate question

2017-09-13 Thread Thomas Munro
On Wed, Sep 6, 2017 at 11:14 PM, Ashutosh Bapat wrote: > On Fri, Jul 21, 2017 at 4:10 AM, Thomas Munro > wrote: >> That just leaves the question of whether we should try to handle the >> empty RHS and single-value RHS cases using statistics. My intuition >> is that we shouldn't, but I'll be happ

Re: [HACKERS] <> join selectivity estimate question

2017-09-06 Thread Tom Lane
Simon Riggs writes: > Why isn't this an open item for PG10? Why should it be? This behavior has existed for a long time. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql

Re: [HACKERS] <> join selectivity estimate question

2017-09-06 Thread Simon Riggs
On 6 September 2017 at 04:14, Ashutosh Bapat wrote: > On Fri, Jul 21, 2017 at 4:10 AM, Thomas Munro > wrote: >> >> Thanks. Bearing all that in mind, I ran through a series of test >> scenarios and discovered that my handling for JOIN_ANTI was wrong: I >> thought that I had to deal with inverting

Re: [HACKERS] <> join selectivity estimate question

2017-09-06 Thread Ashutosh Bapat
On Fri, Jul 21, 2017 at 4:10 AM, Thomas Munro wrote: > > Thanks. Bearing all that in mind, I ran through a series of test > scenarios and discovered that my handling for JOIN_ANTI was wrong: I > thought that I had to deal with inverting the result, but I now see > that that's handled elsewhere (c

Re: [HACKERS] <> join selectivity estimate question

2017-07-20 Thread Thomas Munro
On Fri, Jul 21, 2017 at 8:21 AM, Tom Lane wrote: > Ashutosh Bapat writes: >> On Thu, Jul 20, 2017 at 5:30 PM, Thomas Munro >> wrote: >>> Does anyone know how to test a situation where the join is reversed >>> according to >>> get_join_variables, or "complicated cases where we can't tell for sur

Re: [HACKERS] <> join selectivity estimate question

2017-07-20 Thread Tom Lane
Ashutosh Bapat writes: > On Thu, Jul 20, 2017 at 5:30 PM, Thomas Munro > wrote: >> Does anyone know how to test a situation where the join is reversed >> according to >> get_join_variables, or "complicated cases where we can't tell for sure"? > explain select * from pg_class c right join pg_typ

Re: [HACKERS] <> join selectivity estimate question

2017-07-20 Thread Ashutosh Bapat
On Thu, Jul 20, 2017 at 5:30 PM, Thomas Munro wrote: > On Thu, Jul 20, 2017 at 11:47 PM, Ashutosh Bapat > wrote: >> On Thu, Jul 20, 2017 at 11:04 AM, Thomas Munro >> wrote: >>> On Fri, Jun 2, 2017 at 4:16 AM, Tom Lane wrote: I don't think it does really. The thing about a <> semijoin is t

Re: [HACKERS] <> join selectivity estimate question

2017-07-20 Thread Thomas Munro
On Thu, Jul 20, 2017 at 11:47 PM, Ashutosh Bapat wrote: > On Thu, Jul 20, 2017 at 11:04 AM, Thomas Munro > wrote: >> On Fri, Jun 2, 2017 at 4:16 AM, Tom Lane wrote: >>> I don't think it does really. The thing about a <> semijoin is that it >>> will succeed unless *every* join key value from the

Re: [HACKERS] <> join selectivity estimate question

2017-07-20 Thread Ashutosh Bapat
On Thu, Jul 20, 2017 at 11:04 AM, Thomas Munro wrote: > On Fri, Jun 2, 2017 at 4:16 AM, Tom Lane wrote: >> I don't think it does really. The thing about a <> semijoin is that it >> will succeed unless *every* join key value from the inner query is equal >> to the outer key value (or is null). T

Re: [HACKERS] <> join selectivity estimate question

2017-07-19 Thread Thomas Munro
On Fri, Jun 2, 2017 at 4:16 AM, Tom Lane wrote: > I don't think it does really. The thing about a <> semijoin is that it > will succeed unless *every* join key value from the inner query is equal > to the outer key value (or is null). That's something we should consider > to be of very low proba

Re: [HACKERS] <> join selectivity estimate question

2017-06-01 Thread Tom Lane
Dilip Kumar writes: > Actually, I was not proposing this patch instead I wanted to discuss > the approach. I was claiming that for > non-equal JOIN_SEMI selectivity estimation instead of calculating > selectivity in an existing way i.e > = 1- (selectivity of equal JOIN_SEMI) the better way would

Re: [HACKERS] <> join selectivity estimate question

2017-06-01 Thread Dilip Kumar
On Thu, Jun 1, 2017 at 8:24 PM, Robert Haas wrote: > On Wed, May 31, 2017 at 1:18 PM, Dilip Kumar wrote: >> + if (jointype = JOIN_SEMI) >> + { >> + sjinfo->jointype = JOIN_INNER; >> + } > > That is pretty obviously half-baked and completely untested. Actually, I w

Re: [HACKERS] <> join selectivity estimate question

2017-06-01 Thread Robert Haas
On Wed, May 31, 2017 at 1:18 PM, Dilip Kumar wrote: > + if (jointype = JOIN_SEMI) > + { > + sjinfo->jointype = JOIN_INNER; > + } That is pretty obviously half-baked and completely untested. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise P

Re: [HACKERS] <> join selectivity estimate question

2017-05-31 Thread Dilip Kumar
On Fri, Mar 17, 2017 at 6:49 PM, Thomas Munro wrote: > Right. If I temporarily hack neqjoinsel() thus: > > result = 1.0 - result; > + > + if (jointype == JOIN_SEMI) > + result = 1.0; > + > PG_RETURN_FLOAT8(result); > } I was looking into this problem. IMHO, t

Re: [HACKERS] <> join selectivity estimate question

2017-03-17 Thread Thomas Munro
On Sat, Mar 18, 2017 at 11:49 AM, Thomas Munro wrote: > On Sat, Mar 18, 2017 at 6:14 AM, Tom Lane wrote: >> After a bit more thought, it seems like the bug here is that "the >> fraction of the LHS that has a non-matching row" is not one minus >> "the fraction of the LHS that has a matching row".

Re: [HACKERS] <> join selectivity estimate question

2017-03-17 Thread Thomas Munro
On Sat, Mar 18, 2017 at 6:14 AM, Tom Lane wrote: > After a bit more thought, it seems like the bug here is that "the > fraction of the LHS that has a non-matching row" is not one minus > "the fraction of the LHS that has a matching row". In fact, in > this example, *all* LHS rows have both matchi

Re: [HACKERS] <> join selectivity estimate question

2017-03-17 Thread Tom Lane
Robert Haas writes: > On Fri, Mar 17, 2017 at 1:14 PM, Tom Lane wrote: >> It would not be too hard to convince me that neqjoinsel should >> simply return 1.0 for any semijoin/antijoin case, perhaps with >> some kind of discount for nullfrac. Whether or not there's an >> equal row, there's almost

Re: [HACKERS] <> join selectivity estimate question

2017-03-17 Thread Robert Haas
On Fri, Mar 17, 2017 at 1:14 PM, Tom Lane wrote: > After a bit more thought, it seems like the bug here is that "the > fraction of the LHS that has a non-matching row" is not one minus > "the fraction of the LHS that has a matching row". In fact, in > this example, *all* LHS rows have both matchi

Re: [HACKERS] <> join selectivity estimate question

2017-03-17 Thread Tom Lane
I wrote: > The problem here appears to be that we don't have any MCV list for > the "twothousand" column (because it has a perfectly flat distribution), > and the heuristic that eqjoinsel_semi is using for the no-MCVs case > is falling down badly. Oh ... wait. eqjoinsel_semi's charter is to "esti

Re: [HACKERS] <> join selectivity estimate question

2017-03-17 Thread Tom Lane
Robert Haas writes: > The relevant code is in neqsel(). It estimates the fraction of rows > that will be equal, and then does 1 - that number. Evidently, the > query planner thinks that l1.l_suppkey = l2.l_suppkey would almost > always be true, and therefore l1.l_suppkey <> l2.l_suppkey will alm

Re: [HACKERS] <> join selectivity estimate question

2017-03-17 Thread Robert Haas
On Fri, Mar 17, 2017 at 1:54 AM, Thomas Munro wrote: > SELECT * >FROM lineitem l1 > WHERE EXISTS (SELECT * > FROM lineitem l2 > WHERE l1.l_orderkey = l2.l_orderkey); > > -> estimates 59986012 rows, actual rows 59,986,052 (scale 10 TPCH) > > SELECT * >

[HACKERS] <> join selectivity estimate question

2017-03-16 Thread Thomas Munro
Hi hackers, While studying a regression reported[1] against my parallel hash join patch, I noticed that we can also reach a good and a bad plan in unpatched master. One of the causes seems to be the estimated selectivity of a semi-join with an extra <> filter qual. Here are some times I measured

Re: [HACKERS] join selectivity

2004-12-24 Thread strk
On Thu, Dec 23, 2004 at 10:01:33AM -0500, Tom Lane wrote: > [EMAIL PROTECTED] writes: > > Doing some tests I've found out that the returned value from the > > JOINSEL is applied to REL1.rows X REL2.rows, but REL1 and REL2 > > are not 'base' table, rather relations with a number of > > rows once ag

Re: [HACKERS] join selectivity

2004-12-24 Thread strk
On Thu, Dec 23, 2004 at 10:13:03AM -0500, Tom Lane wrote: > [EMAIL PROTECTED] writes: > > On Thu, Dec 23, 2004 at 10:01:33AM -0500, Tom Lane wrote: > >> Right. This amounts to assuming that the join conditions and the > >> restriction conditions are independent, which of course is bogus, > >> but

Re: [HACKERS] join selectivity

2004-12-24 Thread strk
On Thu, Dec 16, 2004 at 01:56:29PM -0500, Tom Lane wrote: > "Mark Cave-Ayland" <[EMAIL PROTECTED]> writes: > > ... But in the case of > > , if we're estimating the number of rows to return then > > that becomes harder > > I didn't say it was easy ;-). The existing selectivity functions can't >

Re: [HACKERS] join selectivity

2004-12-23 Thread Tom Lane
[EMAIL PROTECTED] writes: > So it should NOT depend on full number of rows either, is this right ? No, it's supposed to return a fraction. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend

Re: [HACKERS] join selectivity

2004-12-23 Thread Tom Lane
[EMAIL PROTECTED] writes: > On Thu, Dec 23, 2004 at 10:01:33AM -0500, Tom Lane wrote: >> Right. This amounts to assuming that the join conditions and the >> restriction conditions are independent, which of course is bogus, >> but we really don't have enough information to do better. > Doesn't JOI

Re: [HACKERS] join selectivity

2004-12-23 Thread Tom Lane
[EMAIL PROTECTED] writes: > Doing some tests I've found out that the returned value from the > JOINSEL is applied to REL1.rows X REL2.rows, but REL1 and REL2 > are not 'base' table, rather relations with a number of > rows once again estimated by other selectivity functions. Right. This amounts

Re: [HACKERS] join selectivity

2004-12-23 Thread strk
On Thu, Dec 16, 2004 at 03:12:21PM -0500, Greg Stark wrote: > > "Mark Cave-Ayland" <[EMAIL PROTECTED]> writes: > > > Well at the moment PostGIS has a RESTRICT function that takes an expression > > of the formwhere column is a column consisting of > > geometries and constant is a bounding box.

Re: [HACKERS] join selectivity

2004-12-16 Thread Greg Stark
"Mark Cave-Ayland" <[EMAIL PROTECTED]> writes: > Well at the moment PostGIS has a RESTRICT function that takes an expression > of the formwhere column is a column consisting of > geometries and constant is a bounding box. This is based upon histogram > statistics and works well. Are these fu

Re: [HACKERS] join selectivity

2004-12-16 Thread Tom Lane
"Mark Cave-Ayland" <[EMAIL PROTECTED]> writes: > ... But in the case of > , if we're estimating the number of rows to return then > that becomes harder I didn't say it was easy ;-). The existing selectivity functions can't do better than a rough guess in such cases, and I don't expect you can e

Re: [HACKERS] join selectivity

2004-12-16 Thread Mark Cave-Ayland
Hi Tom, > -Original Message- > From: Tom Lane [mailto:[EMAIL PROTECTED] > Sent: 16 December 2004 17:56 > To: Mark Cave-Ayland > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; > [EMAIL PROTECTED] > Subject: Re: [HACKERS] join selectivity > > > "Mark Cave

Re: [HACKERS] join selectivity

2004-12-16 Thread Tom Lane
"Mark Cave-Ayland" <[EMAIL PROTECTED]> writes: > OK I think I've misunderstood something more fundamental than that; I > understood from what you said that the RESTRICT clause is used to evaluate > the cost of table1.geom && table2.geom against table2.geom && table1.geom > (i.e. it is used to help

Re: [HACKERS] join selectivity

2004-12-16 Thread Mark Cave-Ayland
> -Original Message- > From: Tom Lane [mailto:[EMAIL PROTECTED] > Sent: 16 December 2004 15:55 > To: Mark Cave-Ayland > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; > [EMAIL PROTECTED] > Subject: Re: [HACKERS] join selectivity > > > "Mark Cave

Re: [HACKERS] join selectivity

2004-12-16 Thread Tom Lane
"Mark Cave-Ayland" <[EMAIL PROTECTED]> writes: > ...and with two indices RESTRICT is called four times. The part I find > confusing is why with one index that RESTRICT is called twice. [ shrug... ] clause_selectivity doesn't try to cache the result. > I was also thinking whether calling RESTRICT

Re: [HACKERS] join selectivity

2004-12-16 Thread Mark Cave-Ayland
Hi Tom, > -Original Message- > From: Tom Lane [mailto:[EMAIL PROTECTED] > Sent: 13 December 2004 17:16 > To: Mark Cave-Ayland > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; > [EMAIL PROTECTED] > Subject: Re: [HACKERS] join selectivity > > > "Mark Cave

Re: [HACKERS] join selectivity

2004-12-14 Thread strk
On Mon, Dec 13, 2004 at 10:16:09AM -, Mark Cave-Ayland wrote: > > > -Original Message- > > From: strk [mailto:[EMAIL PROTECTED] > > Sent: 10 December 2004 15:35 > > To: Mark Cave-Ayland > > Cc: [EMAIL PROTECTED] > > Subject: join selectivity > > > > > > Taking a look at join selecti

Re: [HACKERS] join selectivity

2004-12-13 Thread strk
On Mon, Dec 13, 2004 at 12:16:05PM -0500, Tom Lane wrote: > "Mark Cave-Ayland" <[EMAIL PROTECTED]> writes: > > For a query like this: > > > > SELECT id FROM table1, table2 > > WHERE table1.geom && table2.geom; > > > > RESTRICT selectivity is invoked twice and > > JOIN selectivity is invoked once

Re: [HACKERS] join selectivity

2004-12-13 Thread Tom Lane
"Mark Cave-Ayland" <[EMAIL PROTECTED]> writes: > For a query like this: > > SELECT id FROM table1, table2 > WHERE table1.geom && table2.geom; > > RESTRICT selectivity is invoked twice and > JOIN selectivity is invoked once. Hm, are you testing in a context where both tables have indexes that ar

Re: [HACKERS] join selectivity

2004-12-13 Thread Mark Cave-Ayland
Hi strk, (cut) > > Taking a look at join selectivity... > > For a query like this: > > > > SELECT id FROM table1, table2 > > WHERE table1.geom && table2.geom; > > > > RESTRICT selectivity is invoked twice and > > JOIN selectivity is invoked once. > > The RESTRICT code is not able to find