Manfred Koizar <[EMAIL PROTECTED]> writes:
> Random sampling is more like "every possible sample is equally likely to
> be collected", and two-stage sampling doesn't satisfy this condition.
Okay, I finally see the point here: in the limit as the number of pages
B goes to infinity, you'd expect the
On Fri, 16 Apr 2004 10:34:49 -0400, Tom Lane <[EMAIL PROTECTED]> wrote:
>> p = prod from{i = 0} to{n - 1} {{c(B - i)} over {cB - i}}
>
>So? You haven't proven that either sampling method fails to do the
>same.
On the contrary, I believe that above formula is more or less valid for
both meth
On Tue, 2004-04-13 at 15:18, Tom Lane wrote:
> Robert Treat <[EMAIL PROTECTED]> writes:
> Well, the first problem is why is ANALYZE's estimate of the total row
> count so bad :-( ? I suspect you are running into the situation where
> the initial pages of the table are thinly populated and ANALYZE
Manfred Koizar <[EMAIL PROTECTED]> writes:
> If the number of pages is B and the sample size is n, a perfect sampling
> method collects a sample where all tuples come from different pages with
> probability (in OpenOffice.org syntax):
> p = prod from{i = 0} to{n - 1} {{c(B - i)} over {cB - i
On Thu, 15 Apr 2004 20:18:49 -0400, Tom Lane <[EMAIL PROTECTED]> wrote:
>> getting several tuples from the same page is more likely
>> than with the old method.
>
>Hm, are you sure?
Almost sure. Let's look at a corner case: What is the probability of
getting a sample with no two tuples from the
Manfred Koizar <[EMAIL PROTECTED]> writes:
> My biggest concern at the moment is that the new sampling method
> violates the contract of returning each possible sample with he same
> probability: getting several tuples from the same page is more likely
> than with the old method.
Hm, are you sure
[Just a quick note here; a more thorough discussion of my test results
will be posted to -hackers]
On Tue, 13 Apr 2004 15:18:42 -0400, Tom Lane <[EMAIL PROTECTED]> wrote:
>Well, the first problem is why is ANALYZE's estimate of the total row
>count so bad :-( ? I suspect you are running into the
Robert Treat <[EMAIL PROTECTED]> writes:
> live=# analyze cl;
> ANALYZE
> live=# select reltuples from pg_class where relname = 'cl';
> reltuples
> ---
> 53580
> (1 row)
> live=# vacuum cl;
> VACUUM
> live=# select reltuples from pg_class where relname = 'cl';
> reltuples
> -
In the process of optimizing some queries, I have found the following
query seems to degrade in performance the more accurate I make the
statistics on the table... whether by using increased alter table ...
set statistics or by using vacuum..
SELECT
count( cl.caller_id ),
npanxx.