Re: Removing useless DISTINCT clauses

2018-08-26 Thread David Rowley
On 25 August 2018 at 04:05, Finnerty, Jim wrote: > I'll explore a proper way to test that it's in the single-relation case, and > will post a separate thread for the 'remove unnecessary DISTINCT' > optimization. See: has_multiple_baserels() in allpaths.c -- David Rowley

Re: Removing useless DISTINCT clauses

2018-08-24 Thread Finnerty, Jim
I feel strongly that eliminating the entire DISTINCT or GROUP BY clause (when there are no aggs) is an important optimization, especially when the incremental cost to test for it is so tiny. I'm happy to submit that as a separate thread. My goal here was to move the original proposal along

Re: Removing useless DISTINCT clauses

2018-08-23 Thread David Rowley
On 24 August 2018 at 14:12, Stephen Frost wrote: > This is happening at the same time as some optimizations around GROUP > BY, so either there's something different about what's happening there > and I didn't appreciate it, or does that optimization suffer from a > similar issue? There are two

Re: Removing useless DISTINCT clauses

2018-08-23 Thread Stephen Frost
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frost writes: > > * David Rowley (david.row...@2ndquadrant.com) wrote: > >> On 24 August 2018 at 11:34, Stephen Frost wrote: > >>> * David Rowley (david.row...@2ndquadrant.com) wrote: > My personal opinion of only being able to

Re: Removing useless DISTINCT clauses

2018-08-23 Thread Tom Lane
Stephen Frost writes: > * David Rowley (david.row...@2ndquadrant.com) wrote: >> On 24 August 2018 at 11:34, Stephen Frost wrote: >>> * David Rowley (david.row...@2ndquadrant.com) wrote: My personal opinion of only being able to completely remove the DISTINCT when there's a single item

Re: Removing useless DISTINCT clauses

2018-08-23 Thread Stephen Frost
Greetings, * David Rowley (david.row...@2ndquadrant.com) wrote: > On 24 August 2018 at 11:34, Stephen Frost wrote: > > * David Rowley (david.row...@2ndquadrant.com) wrote: > >> My personal opinion of only being able to completely remove the > >> DISTINCT when there's a single item in the rtable

Re: Removing useless DISTINCT clauses

2018-08-23 Thread Stephen Frost
Greetings, * Jim Finnerty (jfinn...@amazon.com) wrote: > I was thinking about this last night and I realized that putting the new > hasModifiedDistinct flag on the PlannerInfo struct eliminates the need to > deal with the serialization issues, and makes it simpler. > > Here's a new patch (v7)

Re: Removing useless DISTINCT clauses

2018-08-23 Thread David Rowley
On 24 August 2018 at 11:34, Stephen Frost wrote: > Greetings, > > * David Rowley (david.row...@2ndquadrant.com) wrote: >> My personal opinion of only being able to completely remove the >> DISTINCT when there's a single item in the rtable (or a single base >> table) is that it's just too poor to

Re: Removing useless DISTINCT clauses

2018-08-23 Thread Stephen Frost
Greetings, * David Rowley (david.row...@2ndquadrant.com) wrote: > My personal opinion of only being able to completely remove the > DISTINCT when there's a single item in the rtable (or a single base > table) is that it's just too poor to bother with. I think such a > solution is best left to

Re: Removing useless DISTINCT clauses

2018-08-23 Thread David Rowley
On 24 August 2018 at 03:29, Finnerty, Jim wrote: > Thank you Álvaro. Here's the distinct_opt_v7 patch. Determining if there's just 1 base relation by checking the length the rtable is not a good way. If you want to see why, try creating a primary key on a partitioned table. My personal opinion

Re: Removing useless DISTINCT clauses

2018-08-23 Thread Alvaro Herrera
On 2018-Aug-23, Jim Finnerty wrote: > distinct_opt_v7.patch > Are you posting this via postgresql-archive.org? Please don't do that. These posts of yours are mangled by postgresql-archive and do not contain the file you're

Re: Removing useless DISTINCT clauses

2018-08-22 Thread David Rowley
On 23 August 2018 at 08:11, Jim Finnerty wrote: > Here is an update to this thread, for potential inclusion in v12. I > couldn't get the most recent 'v5' patch to apply cleanly, so I recreated a > v6 patch on PG10.5 by hand, and made a few changes and improvements: Well, the patch I wrote was

Re: Removing useless DISTINCT clauses

2018-04-07 Thread David Rowley
On 8 April 2018 at 08:55, Tom Lane wrote: > regression=# create table t1 (a int,b int, c int, d int, primary key(a,b)); > CREATE TABLE > regression=# explain verbose select distinct * from t1 order by a,c,b; > server closed the connection unexpectedly > This probably

Re: Removing useless DISTINCT clauses

2018-04-07 Thread Tom Lane
I wrote: > My gripe about > this as it stands is mainly that it seems like it's going to do > a lot of catalog-lookup work twice over, at least in cases that > have both DISTINCT and GROUP BY --- or is that too small a set to > worry about? I convinced myself that that was a silly objection, and

Re: Removing useless DISTINCT clauses

2018-04-07 Thread Tom Lane
David Rowley writes: > It's certainly possible to do more here. I'm uncertain what needs to > be done in regards to cached plan invalidation, but we'd certainly > need to ensure cached plans are invalidated whenever the attnotnull is > changed. They already are;

Re: Removing useless DISTINCT clauses

2018-04-05 Thread David Rowley
On 6 April 2018 at 03:36, Melanie Plageman wrote: > The updated patch looks good to me. > I've changed the status to "ready for committer" Great. Thanks for reviewing this! -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7

Re: Removing useless DISTINCT clauses

2018-04-05 Thread Melanie Plageman
Hi David, The updated patch looks good to me. I've changed the status to "ready for committer" Thanks

Re: Removing useless DISTINCT clauses

2018-03-23 Thread David Rowley
On 24 March 2018 at 01:42, Jim Finnerty wrote: > Distinctness can also be preserved across joins, so if you have a 'snowflake > query' type join, where all the joins are to a unique key, then the > distinctness of the other side of the join is preserved. For example, a >

Re: Removing useless DISTINCT clauses

2018-03-23 Thread David Rowley
On 24 March 2018 at 05:55, Melanie Plageman wrote: > I was just wondering, since get_primary_key_attnos opens pg_constraint and > just skips attributes with other constraint types than primary, what would > be the reason we wouldn't just also open pg_attribute and check

Re: Removing useless DISTINCT clauses

2018-03-23 Thread David Rowley
On 22 March 2018 at 18:58, David Rowley wrote: > On 21 March 2018 at 16:29, Melanie Plageman wrote: >> The new status of this patch is: Waiting on Author > > Thanks for reviewing this. I've attached an updated patch. I'll set > back to

Re: Removing useless DISTINCT clauses

2018-03-23 Thread Melanie Plageman
On Thu, Mar 22, 2018 at 5:20 PM, David Rowley wrote: > The problem is that in order to properly invalidate cached plans we > must record the constraint OIDs which the plan depends on. We can do > that for PK and UNIQUE constraints, unfortunately, we can't do it for

Re: Removing useless DISTINCT clauses

2018-03-23 Thread Laurenz Albe
David Rowley wrote: > > Would it be very difficult to extend that to "if any unique constraints are > > contained in the DISTINCT clause"? > > Unfortunately, it's quite a bit more work to make that happen. It's > not just unique constraints that are required to make this work. We > also need to

Re: Removing useless DISTINCT clauses

2018-03-22 Thread David Rowley
On 22 March 2018 at 21:16, Laurenz Albe wrote: > Would it be very difficult to extend that to "if any unique constraints are > contained in the DISTINCT clause"? Unfortunately, it's quite a bit more work to make that happen. It's not just unique constraints that are

Re: Removing useless DISTINCT clauses

2018-03-22 Thread Laurenz Albe
Melanie Plageman wrote: > Contents & Purpose > == > > This patch removes any additional columns in the DISTINCT clause when the > table's primary key columns are entirely present in the DISTINCT clause. This > optimization works because the PK columns functionally determine the

Re: Removing useless DISTINCT clauses

2018-03-21 Thread David Rowley
On 21 March 2018 at 16:29, Melanie Plageman wrote: > For a small performance hit but an improvement in readability, the length > check > can be moved from the individual group by and distinct clause checks into the > helper function > > if

Re: Removing useless DISTINCT clauses

2018-03-20 Thread Melanie Plageman
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed This is a review of the patch to remove useless DISTINCT

Re: [HACKERS] Removing useless DISTINCT clauses

2018-03-04 Thread David Rowley
On 10 January 2018 at 09:26, Tom Lane wrote: > 1. Once you don't have all the tlist items shown in DISTINCT, it really is > more like DISTINCT ON, seems like. I am not sure it's a good idea to set > hasDistinctOn, because that engages some planner behaviors we probably >

Re: [HACKERS] Removing useless DISTINCT clauses

2018-03-01 Thread David Rowley
On 2 March 2018 at 09:51, Andres Freund wrote: > This patch has been waiting on author since 2018-01-09, the next & last > CF has started. I'm inclined to mark this as returned with feedback. I'm planning on making the required changes at the weekend. -- David Rowley

Re: [HACKERS] Removing useless DISTINCT clauses

2018-01-09 Thread David Rowley
Hi Tom, Thanks for looking at this. On 10 January 2018 at 09:26, Tom Lane wrote: > This is a cute idea, but I'm troubled by a couple of points: > > 1. Once you don't have all the tlist items shown in DISTINCT, it really is > more like DISTINCT ON, seems like. I am not sure

Re: [HACKERS] Removing useless DISTINCT clauses

2018-01-09 Thread Tom Lane
David Rowley writes: > [ remove_useless_distinct_clauses_v2.patch ] This is a cute idea, but I'm troubled by a couple of points: 1. Once you don't have all the tlist items shown in DISTINCT, it really is more like DISTINCT ON, seems like. I am not sure it's a good

Re: [HACKERS] Removing useless DISTINCT clauses

2018-01-06 Thread David Rowley
Hi Jeff, Thanks for looking at the patch. On 6 January 2018 at 10:34, Jeff Janes wrote: > Couldn't the Unique node be removed entirely? If k is a primary key, you > can't have duplicates in need of removal. It probably could be, if there were no joins, but any join could

Re: [HACKERS] Removing useless DISTINCT clauses

2018-01-05 Thread Jeff Janes
On Mon, Nov 6, 2017 at 1:16 AM, David Rowley wrote: > In [1] we made a change to process the GROUP BY clause to remove any > group by items that are functionally dependent on some other GROUP BY > items. > > This really just checks if a table's PK columns are