subject:"Re\: \[HACKERS\] improving GROUP BY estimation"

Re: [HACKERS] improving GROUP BY estimation

2016-04-02 Thread Tom Lane

Dean Rasheed writes: > Here's an updated patch with references to both papers, and a more > detailed justification for the formula, along with the other changes > discussed. Note that although equation (2) in the Dell'Era paper looks > different from the Yao formula, it's actually the same. Looks

Re: [HACKERS] improving GROUP BY estimation

2016-04-02 Thread Dean Rasheed

On 31 March 2016 at 22:02, Tom Lane wrote: > I'm just concerned about what happens when the Dellera paper stops being > available. I don't mind including that URL as a backup to the written-out > argument I just suggested. > Here's an updated patch with references to both papers, and a more deta

Re: [HACKERS] improving GROUP BY estimation

2016-03-31 Thread Tom Lane

Alvaro Herrera writes: > Tom Lane wrote: >> See "Approximating block accesses in database organizations", S. B. Yao, >> Communications of the ACM, Volume 20 Issue 4, April 1977 Pages 260-261 > That sounds nice all right, but I'm not sure it's actually helpful, > because the article text is not av

Re: [HACKERS] improving GROUP BY estimation

2016-03-31 Thread Tom Lane

Dean Rasheed writes: > Yeah, that makes sense. In fact, if we only apply the adjustment when > reldistinct > 0 and rel->rows < rel->tuples, and rewrite the first > argument to pow() as (rel->tuples - rel->rows) / rel->tuples, then it > is guaranteed to be non-negative. If rel->rows >= rel->tuples

Re: [HACKERS] improving GROUP BY estimation

2016-03-31 Thread Tom Lane

Dean Rasheed writes: > On 31 March 2016 at 21:40, Tom Lane wrote: >> Let's use something like this: >> See "Approximating block accesses in database organizations", S. B. Yao, >> Communications of the ACM, Volume 20 Issue 4, April 1977 Pages 260-261 Actually, having now looked at both the Deller

Re: [HACKERS] improving GROUP BY estimation

2016-03-31 Thread Alvaro Herrera

Tom Lane wrote: > > The article text refers to this 1977 S. B. Yao paper "Approximating > > block accesses in database organizations" which doesn't appear to be > > available online, except behind ACM's paywall at > > http://dl.acm.org/citation.cfm?id=359475 > > Well, a CACM citation is perfectl

Re: [HACKERS] improving GROUP BY estimation

2016-03-31 Thread Dean Rasheed

On 31 March 2016 at 21:40, Tom Lane wrote: > Alvaro Herrera writes: >> Tom Lane wrote: >>> Another minor gripe is the use of a random URL as justification. This >>> code will still be around when that URL exists nowhere but the Wayback >>> Machine. Can't we find a more formal citation to use? >

Re: [HACKERS] improving GROUP BY estimation

2016-03-31 Thread Dean Rasheed

On 31 March 2016 at 20:18, Tom Lane wrote: > Also, I wonder if it'd be a good idea to provide a guard against division > by zero --- we know rel->tuples > 0 at this point, but I'm less sure that > reldistinct can't be zero. In the same vein, I'm worried about the first > argument of pow() being s

Re: [HACKERS] improving GROUP BY estimation

2016-03-31 Thread Tom Lane

Alvaro Herrera writes: > Tom Lane wrote: >> Another minor gripe is the use of a random URL as justification. This >> code will still be around when that URL exists nowhere but the Wayback >> Machine. Can't we find a more formal citation to use? > The article text refers to this 1977 S. B. Yao p

Re: [HACKERS] improving GROUP BY estimation

2016-03-31 Thread Alvaro Herrera

Tom Lane wrote: > Another minor gripe is the use of a random URL as justification. This > code will still be around when that URL exists nowhere but the Wayback > Machine. Can't we find a more formal citation to use? The article text refers to this 1977 S. B. Yao paper "Approximating block acce

Re: [HACKERS] improving GROUP BY estimation

2016-03-31 Thread Tom Lane

Dean Rasheed writes: > On 30 March 2016 at 14:03, Tomas Vondra wrote: >> Attached is v4 of the patch > Thanks, I think this is good to go, except that I think we need to use > pow() rather than powl() because AIUI powl() is new to C99, and so > won't necessarily be available on all supported pla

Re: [HACKERS] improving GROUP BY estimation

2016-03-31 Thread Dean Rasheed

On 30 March 2016 at 14:03, Tomas Vondra wrote: > Attached is v4 of the patch Thanks, I think this is good to go, except that I think we need to use pow() rather than powl() because AIUI powl() is new to C99, and so won't necessarily be available on all supported platforms. I don't think we need w

Re: [HACKERS] improving GROUP BY estimation

2016-03-30 Thread Tomas Vondra

Hi, On 03/22/2016 03:40 PM, Alexander Korotkov wrote: I think you should send a revision of patch including comments proposed by Deam Rasheed. I'm switching patch status to waiting on author in commitfest. Attached is v4 of the patch - the only difference w.r.t. v3 is that I've used the com

Re: [HACKERS] improving GROUP BY estimation

2016-03-29 Thread David Steele

Hi Thomas, On 3/22/16 10:40 AM, Alexander Korotkov wrote: I think you should send a revision of patch including comments proposed by Deam Rasheed. I'm switching patch status to waiting on author in commitfest. We're getting pretty close to the end of the CF. Do you know when you will have a

Re: [HACKERS] improving GROUP BY estimation

2016-03-22 Thread Alexander Korotkov

Hi, Tomas! On Mon, Mar 21, 2016 at 2:39 PM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > On Fri, Mar 18, 2016 at 1:20 PM, Dean Rasheed > wrote: > >> Probably a better URL to give is >> http://www.adellera.it/investigations/distinct_balls/ which has a link >> to the PDF version of the

Re: [HACKERS] improving GROUP BY estimation

2016-03-21 Thread Alexander Korotkov

Hi, Dean! On Fri, Mar 18, 2016 at 1:20 PM, Dean Rasheed wrote: > Probably a better URL to give is > http://www.adellera.it/investigations/distinct_balls/ which has a link > to the PDF version of the paper and also some supporting material. > > However, while that paper is in general very clear,

Re: [HACKERS] improving GROUP BY estimation

2016-03-19 Thread Dean Rasheed

On 18 March 2016 at 00:37, Tomas Vondra wrote: >> On Sun, 2016-03-13 at 15:24 +, Dean Rasheed wrote: >>> I think that a better formula to use would be >>> >>> reldistinct *= (1 - powl(1 - rel-rows / rel->tuples, rel->tuples / >>> reldistinct) > > Attached is a v3 of the patch using this formul

Re: [HACKERS] improving GROUP BY estimation

2016-03-19 Thread Tomas Vondra

On 03/13/2016 11:09 PM, Tomas Vondra wrote: Hi, On Sun, 2016-03-13 at 15:24 +, Dean Rasheed wrote: On 4 March 2016 at 13:10, Tomas Vondra wrote: ... >>> I think that a better formula to use would be reldistinct *= (1 - powl(1 - rel-rows / rel->tuples, rel->tuples / reldistinct) Att

Re: [HACKERS] improving GROUP BY estimation

2016-03-13 Thread Tomas Vondra

Hi, On Sun, 2016-03-13 at 15:24 +, Dean Rasheed wrote: > On 4 March 2016 at 13:10, Tomas Vondra > wrote: > > > > The risk associated with over-estimation is that we may switch from > > HashAggregate to GroupAggregate, and generally select paths better > > suited for large number of rows. Not

Re: [HACKERS] improving GROUP BY estimation

2016-03-13 Thread Dean Rasheed

On 4 March 2016 at 13:10, Tomas Vondra wrote: > The risk associated with over-estimation is that we may switch from > HashAggregate to GroupAggregate, and generally select paths better > suited for large number of rows. Not great, because the plan may not be > optimal and thus run much slower - bu

Re: [HACKERS] improving GROUP BY estimation

2016-03-04 Thread Tomas Vondra

On Thu, 2016-03-03 at 11:42 -0800, Mark Dilger wrote: > > On Mar 3, 2016, at 11:27 AM, Alexander Korotkov > > wrote: > > > > On Thu, Mar 3, 2016 at 10:16 PM, Tomas Vondra > > wrote: > > So yes, each estimator works great for exactly the opposite cases. But > > notice that typically, the resul

Re: [HACKERS] improving GROUP BY estimation

2016-03-03 Thread Mark Dilger

> On Mar 3, 2016, at 11:27 AM, Alexander Korotkov > wrote: > > On Thu, Mar 3, 2016 at 10:16 PM, Tomas Vondra > wrote: > So yes, each estimator works great for exactly the opposite cases. But notice > that typically, the results of the new formula is much higher than the old > one, sometimes

Re: [HACKERS] improving GROUP BY estimation

2016-03-03 Thread Alexander Korotkov

On Thu, Mar 3, 2016 at 10:16 PM, Tomas Vondra wrote: > So yes, each estimator works great for exactly the opposite cases. But > notice that typically, the results of the new formula is much higher than > the old one, sometimes by two orders of magnitude (and it shouldn't be > difficult to constru

Re: [HACKERS] improving GROUP BY estimation

2016-03-03 Thread Tomas Vondra

Hi, On 03/03/2016 06:40 PM, Mark Dilger wrote: On Mar 3, 2016, at 8:35 AM, Tomas Vondra wrote: Hi, On 03/03/2016 12:53 PM, Alexander Korotkov wrote: Hi, Tomas! I've assigned to review this patch. I've checked version estimate-num-groups-v2.txt by Mark Dilger. It applies to head cleanly,

Re: [HACKERS] improving GROUP BY estimation

2016-03-03 Thread Alexander Korotkov

On Thu, Mar 3, 2016 at 7:35 PM, Tomas Vondra wrote: > On 03/03/2016 12:53 PM, Alexander Korotkov wrote: > >> I've assigned to review this patch. >> >> I've checked version estimate-num-groups-v2.txt by Mark Dilger. >> It applies to head cleanly, passes corrected regression tests. >> >> About corr

Re: [HACKERS] improving GROUP BY estimation

2016-03-03 Thread Mark Dilger

> On Mar 3, 2016, at 8:35 AM, Tomas Vondra wrote: > > Hi, > > On 03/03/2016 12:53 PM, Alexander Korotkov wrote: >> Hi, Tomas! >> >> I've assigned to review this patch. >> >> I've checked version estimate-num-groups-v2.txt by Mark Dilger. >> It applies to head cleanly, passes corrected regress

Re: [HACKERS] improving GROUP BY estimation

2016-03-03 Thread Tomas Vondra

Hi, On 03/03/2016 12:53 PM, Alexander Korotkov wrote: Hi, Tomas! I've assigned to review this patch. I've checked version estimate-num-groups-v2.txt by Mark Dilger. It applies to head cleanly, passes corrected regression tests. About correlated/uncorrelated cases. I think now optimizer mostly

Re: [HACKERS] improving GROUP BY estimation

2016-03-03 Thread Alexander Korotkov

Hi, Tomas! I've assigned to review this patch. I've checked version estimate-num-groups-v2.txt by Mark Dilger. It applies to head cleanly, passes corrected regression tests. About correlated/uncorrelated cases. I think now optimizer mostly assumes all distributions to be independent. I think we

Re: [HACKERS] improving GROUP BY estimation

2016-02-25 Thread Tomas Vondra

Hi, On 02/26/2016 04:32 AM, Mark Dilger wrote: On Feb 25, 2016, at 4:59 PM, Tomas Vondra wrote: ... The culprit here is that the two columns are not independent, but are (rather strongly) correlated due to the way you've generated the data. Yes, that was intentional. Your formula is exa

Re: [HACKERS] improving GROUP BY estimation

2016-02-25 Thread Mark Dilger

> On Feb 25, 2016, at 4:59 PM, Tomas Vondra > wrote: > > Hi, > > On 02/26/2016 12:16 AM, Mark Dilger wrote: >> >> It is not hard to write test cases where your patched version overestimates >> the number of rows by a very similar factor as the old code underestimates >> them. My very first t

Re: [HACKERS] improving GROUP BY estimation

2016-02-25 Thread Tomas Vondra

Hi, On 02/26/2016 12:16 AM, Mark Dilger wrote: It is not hard to write test cases where your patched version overestimates the number of rows by a very similar factor as the old code underestimates them. My very first test, which was not specifically designed to demonstrate this, happens to be

Re: [HACKERS] improving GROUP BY estimation

2016-02-25 Thread Mark Dilger

> On Feb 25, 2016, at 3:16 PM, Mark Dilger wrote: > > >> On Feb 23, 2016, at 5:12 PM, Tomas Vondra >> wrote: >> >> >> >> So much better. Clearly, there are cases where this will over-estimate the >> cardinality - for example when the values are somehow correlated. >> > > I applied your

Re: [HACKERS] improving GROUP BY estimation

2016-02-25 Thread Mark Dilger

> On Feb 23, 2016, at 5:12 PM, Tomas Vondra > wrote: > > > > So much better. Clearly, there are cases where this will over-estimate the > cardinality - for example when the values are somehow correlated. > I applied your patch, which caused a few regression tests to fail. Attached is a pa

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

Re: [HACKERS] improving GROUP BY estimation

33 matches

Site Navigation

Mail list logo

Footer information