Dean Rasheed writes:
> Here's an updated patch with references to both papers, and a more
> detailed justification for the formula, along with the other changes
> discussed. Note that although equation (2) in the Dell'Era paper looks
> different from the Yao formula,
On 31 March 2016 at 22:02, Tom Lane wrote:
> I'm just concerned about what happens when the Dellera paper stops being
> available. I don't mind including that URL as a backup to the written-out
> argument I just suggested.
>
Here's an updated patch with references to both
Alvaro Herrera writes:
> Tom Lane wrote:
>> See "Approximating block accesses in database organizations", S. B. Yao,
>> Communications of the ACM, Volume 20 Issue 4, April 1977 Pages 260-261
> That sounds nice all right, but I'm not sure it's actually helpful,
> because
Dean Rasheed writes:
> Yeah, that makes sense. In fact, if we only apply the adjustment when
> reldistinct > 0 and rel->rows < rel->tuples, and rewrite the first
> argument to pow() as (rel->tuples - rel->rows) / rel->tuples, then it
> is guaranteed to be non-negative.
Dean Rasheed writes:
> On 31 March 2016 at 21:40, Tom Lane wrote:
>> Let's use something like this:
>> See "Approximating block accesses in database organizations", S. B. Yao,
>> Communications of the ACM, Volume 20 Issue 4, April 1977 Pages 260-261
Tom Lane wrote:
> > The article text refers to this 1977 S. B. Yao paper "Approximating
> > block accesses in database organizations" which doesn't appear to be
> > available online, except behind ACM's paywall at
> > http://dl.acm.org/citation.cfm?id=359475
>
> Well, a CACM citation is
On 31 March 2016 at 21:40, Tom Lane wrote:
> Alvaro Herrera writes:
>> Tom Lane wrote:
>>> Another minor gripe is the use of a random URL as justification. This
>>> code will still be around when that URL exists nowhere but the Wayback
>>> Machine.
On 31 March 2016 at 20:18, Tom Lane wrote:
> Also, I wonder if it'd be a good idea to provide a guard against division
> by zero --- we know rel->tuples > 0 at this point, but I'm less sure that
> reldistinct can't be zero. In the same vein, I'm worried about the first
>
Alvaro Herrera writes:
> Tom Lane wrote:
>> Another minor gripe is the use of a random URL as justification. This
>> code will still be around when that URL exists nowhere but the Wayback
>> Machine. Can't we find a more formal citation to use?
> The article text
Tom Lane wrote:
> Another minor gripe is the use of a random URL as justification. This
> code will still be around when that URL exists nowhere but the Wayback
> Machine. Can't we find a more formal citation to use?
The article text refers to this 1977 S. B. Yao paper "Approximating
block
Dean Rasheed writes:
> On 30 March 2016 at 14:03, Tomas Vondra wrote:
>> Attached is v4 of the patch
> Thanks, I think this is good to go, except that I think we need to use
> pow() rather than powl() because AIUI powl() is new to C99, and
On 30 March 2016 at 14:03, Tomas Vondra wrote:
> Attached is v4 of the patch
Thanks, I think this is good to go, except that I think we need to use
pow() rather than powl() because AIUI powl() is new to C99, and so
won't necessarily be available on all supported
Hi,
On 03/22/2016 03:40 PM, Alexander Korotkov wrote:
I think you should send a revision of patch including comments proposed
by Deam Rasheed.
I'm switching patch status to waiting on author in commitfest.
Attached is v4 of the patch - the only difference w.r.t. v3 is that I've
used the
Hi Thomas,
On 3/22/16 10:40 AM, Alexander Korotkov wrote:
I think you should send a revision of patch including comments proposed
by Deam Rasheed.
I'm switching patch status to waiting on author in commitfest.
We're getting pretty close to the end of the CF. Do you know when you
will have
Hi, Tomas!
On Mon, Mar 21, 2016 at 2:39 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:
> On Fri, Mar 18, 2016 at 1:20 PM, Dean Rasheed
> wrote:
>
>> Probably a better URL to give is
>> http://www.adellera.it/investigations/distinct_balls/ which has a link
Hi, Dean!
On Fri, Mar 18, 2016 at 1:20 PM, Dean Rasheed
wrote:
> Probably a better URL to give is
> http://www.adellera.it/investigations/distinct_balls/ which has a link
> to the PDF version of the paper and also some supporting material.
>
> However, while that paper
On 18 March 2016 at 00:37, Tomas Vondra wrote:
>> On Sun, 2016-03-13 at 15:24 +, Dean Rasheed wrote:
>>> I think that a better formula to use would be
>>>
>>> reldistinct *= (1 - powl(1 - rel-rows / rel->tuples, rel->tuples /
>>> reldistinct)
>
> Attached is a v3
On 03/13/2016 11:09 PM, Tomas Vondra wrote:
Hi,
On Sun, 2016-03-13 at 15:24 +, Dean Rasheed wrote:
On 4 March 2016 at 13:10, Tomas Vondra
wrote:
...
>>>
I think that a better formula to use would be
reldistinct *= (1 - powl(1 - rel-rows / rel->tuples,
Hi,
On Sun, 2016-03-13 at 15:24 +, Dean Rasheed wrote:
> On 4 March 2016 at 13:10, Tomas Vondra
> wrote:
> >
> > The risk associated with over-estimation is that we may switch from
> > HashAggregate to GroupAggregate, and generally select paths better
> >
On 4 March 2016 at 13:10, Tomas Vondra wrote:
> The risk associated with over-estimation is that we may switch from
> HashAggregate to GroupAggregate, and generally select paths better
> suited for large number of rows. Not great, because the plan may not be
>
On Thu, 2016-03-03 at 11:42 -0800, Mark Dilger wrote:
> > On Mar 3, 2016, at 11:27 AM, Alexander Korotkov
> > wrote:
> >
> > On Thu, Mar 3, 2016 at 10:16 PM, Tomas Vondra
> > wrote:
> > So yes, each estimator works great for exactly the
> On Mar 3, 2016, at 11:27 AM, Alexander Korotkov
> wrote:
>
> On Thu, Mar 3, 2016 at 10:16 PM, Tomas Vondra
> wrote:
> So yes, each estimator works great for exactly the opposite cases. But notice
> that typically, the results of the
On Thu, Mar 3, 2016 at 10:16 PM, Tomas Vondra
wrote:
> So yes, each estimator works great for exactly the opposite cases. But
> notice that typically, the results of the new formula is much higher than
> the old one, sometimes by two orders of magnitude (and it
Hi,
On 03/03/2016 06:40 PM, Mark Dilger wrote:
On Mar 3, 2016, at 8:35 AM, Tomas Vondra wrote:
Hi,
On 03/03/2016 12:53 PM, Alexander Korotkov wrote:
Hi, Tomas!
I've assigned to review this patch.
I've checked version estimate-num-groups-v2.txt by Mark
On Thu, Mar 3, 2016 at 7:35 PM, Tomas Vondra
wrote:
> On 03/03/2016 12:53 PM, Alexander Korotkov wrote:
>
>> I've assigned to review this patch.
>>
>> I've checked version estimate-num-groups-v2.txt by Mark Dilger.
>> It applies to head cleanly, passes corrected
> On Mar 3, 2016, at 8:35 AM, Tomas Vondra wrote:
>
> Hi,
>
> On 03/03/2016 12:53 PM, Alexander Korotkov wrote:
>> Hi, Tomas!
>>
>> I've assigned to review this patch.
>>
>> I've checked version estimate-num-groups-v2.txt by Mark Dilger.
>> It applies to head
Hi,
On 03/03/2016 12:53 PM, Alexander Korotkov wrote:
Hi, Tomas!
I've assigned to review this patch.
I've checked version estimate-num-groups-v2.txt by Mark Dilger.
It applies to head cleanly, passes corrected regression tests.
About correlated/uncorrelated cases. I think now optimizer
Hi, Tomas!
I've assigned to review this patch.
I've checked version estimate-num-groups-v2.txt by Mark Dilger.
It applies to head cleanly, passes corrected regression tests.
About correlated/uncorrelated cases. I think now optimizer mostly assumes
all distributions to be independent.
I think we
Hi,
On 02/26/2016 04:32 AM, Mark Dilger wrote:
On Feb 25, 2016, at 4:59 PM, Tomas Vondra wrote:
...
The culprit here is that the two columns are not independent, but
are (rather strongly) correlated due to the way you've generated
the data.
Yes, that was
> On Feb 25, 2016, at 4:59 PM, Tomas Vondra
> wrote:
>
> Hi,
>
> On 02/26/2016 12:16 AM, Mark Dilger wrote:
>>
>> It is not hard to write test cases where your patched version overestimates
>> the number of rows by a very similar factor as the old code
Hi,
On 02/26/2016 12:16 AM, Mark Dilger wrote:
It is not hard to write test cases where your patched version overestimates
the number of rows by a very similar factor as the old code underestimates
them. My very first test, which was not specifically designed to demonstrate
this, happens to
> On Feb 25, 2016, at 3:16 PM, Mark Dilger wrote:
>
>
>> On Feb 23, 2016, at 5:12 PM, Tomas Vondra
>> wrote:
>>
>>
>>
>> So much better. Clearly, there are cases where this will over-estimate the
>> cardinality - for example when the
> On Feb 23, 2016, at 5:12 PM, Tomas Vondra
> wrote:
>
>
>
> So much better. Clearly, there are cases where this will over-estimate the
> cardinality - for example when the values are somehow correlated.
>
I applied your patch, which caused a few regression
Hi,
while working of using correlation statistics when estimating GROUP BY
(or generally whatever uses estimate_num_groups), I've noticed that we
apply the selectivity in a rather naive way.
After estimating number of groups in the whole table - either by reading
pg_stats.n_distinct for a
34 matches
Mail list logo