Re: [PERFORM] query slows down with more accurate stats

2004-04-19 Thread Tom Lane
Manfred Koizar <[EMAIL PROTECTED]> writes: > Random sampling is more like "every possible sample is equally likely to > be collected", and two-stage sampling doesn't satisfy this condition. Okay, I finally see the point here: in the limit as the number of pages B goes to infinity, you'd expect the

Re: [PERFORM] query slows down with more accurate stats

2004-04-16 Thread Manfred Koizar
On Fri, 16 Apr 2004 10:34:49 -0400, Tom Lane <[EMAIL PROTECTED]> wrote: >> p = prod from{i = 0} to{n - 1} {{c(B - i)} over {cB - i}} > >So? You haven't proven that either sampling method fails to do the >same. On the contrary, I believe that above formula is more or less valid for both meth

Re: [PERFORM] query slows down with more accurate stats

2004-04-16 Thread Robert Treat
On Tue, 2004-04-13 at 15:18, Tom Lane wrote: > Robert Treat <[EMAIL PROTECTED]> writes: > Well, the first problem is why is ANALYZE's estimate of the total row > count so bad :-( ? I suspect you are running into the situation where > the initial pages of the table are thinly populated and ANALYZE

Re: [PERFORM] query slows down with more accurate stats

2004-04-16 Thread Tom Lane
Manfred Koizar <[EMAIL PROTECTED]> writes: > If the number of pages is B and the sample size is n, a perfect sampling > method collects a sample where all tuples come from different pages with > probability (in OpenOffice.org syntax): > p = prod from{i = 0} to{n - 1} {{c(B - i)} over {cB - i

Re: [PERFORM] query slows down with more accurate stats

2004-04-16 Thread Manfred Koizar
On Thu, 15 Apr 2004 20:18:49 -0400, Tom Lane <[EMAIL PROTECTED]> wrote: >> getting several tuples from the same page is more likely >> than with the old method. > >Hm, are you sure? Almost sure. Let's look at a corner case: What is the probability of getting a sample with no two tuples from the

Re: [PERFORM] query slows down with more accurate stats

2004-04-15 Thread Tom Lane
Manfred Koizar <[EMAIL PROTECTED]> writes: > My biggest concern at the moment is that the new sampling method > violates the contract of returning each possible sample with he same > probability: getting several tuples from the same page is more likely > than with the old method. Hm, are you sure

Re: [PERFORM] query slows down with more accurate stats

2004-04-15 Thread Manfred Koizar
[Just a quick note here; a more thorough discussion of my test results will be posted to -hackers] On Tue, 13 Apr 2004 15:18:42 -0400, Tom Lane <[EMAIL PROTECTED]> wrote: >Well, the first problem is why is ANALYZE's estimate of the total row >count so bad :-( ? I suspect you are running into the

Re: [PERFORM] query slows down with more accurate stats

2004-04-13 Thread Tom Lane
Robert Treat <[EMAIL PROTECTED]> writes: > live=# analyze cl; > ANALYZE > live=# select reltuples from pg_class where relname = 'cl'; > reltuples > --- > 53580 > (1 row) > live=# vacuum cl; > VACUUM > live=# select reltuples from pg_class where relname = 'cl'; > reltuples > -

[PERFORM] query slows down with more accurate stats

2004-04-13 Thread Robert Treat
In the process of optimizing some queries, I have found the following query seems to degrade in performance the more accurate I make the statistics on the table... whether by using increased alter table ... set statistics or by using vacuum.. SELECT count( cl.caller_id ), npanxx.