"Nathan Boley" <[EMAIL PROTECTED]> writes:
> Given that, wouldn't it be smarter to consider a value as an mcv
> candidate iff it has a density greater than 0.005, rather than having
> a count greater than 1.5*average?
Yeah, perhaps ... want to experiment with that? Though I'd be a bit
worried abo
> Assuming that the threshold
> for switching to an indexscan is somewhere around selectivity 0.005
> (I am not certain offhand, but it's in that general area), this cannot
> possibly require more than 200 MCV slots, and for most data
> distributions it'd be a whole lot less.
Thats a really good p
"Tom Lane" <[EMAIL PROTECTED]> writes:
> Gregory Stark <[EMAIL PROTECTED]> writes:
>> It's possible that the second option I described -- teaching Append when to
>> use something other than sum() -- would only work in the cases where
>> constraint exclusion could be fixed though. In which case hav
Tom Lane wrote:
This gets back to the discussions at PGCon about needing to have a more
explicit representation of partitioning. Right now, for a
many-partition table we spend huge amounts of time deriving the expected
behavior from first principles, each time we make a plan. And even then
we
Gregory Stark <[EMAIL PROTECTED]> writes:
> It's possible that the second option I described -- teaching Append when to
> use something other than sum() -- would only work in the cases where
> constraint exclusion could be fixed though. In which case having fractional
> row counts might actually be
"Tom Lane" <[EMAIL PROTECTED]> writes:
> Gregory Stark <[EMAIL PROTECTED]> writes:
>> The screw case I've seen is when you have a large partitioned table where
>> constraint_exclusion fails to exclude the irrelevant partitions. You're going
>> to get 0 rows from all but the one partition which con
Gregory Stark <[EMAIL PROTECTED]> writes:
> The screw case I've seen is when you have a large partitioned table where
> constraint_exclusion fails to exclude the irrelevant partitions. You're going
> to get 0 rows from all but the one partition which contains the 1 row you're
> looking for. But sin
"Tom Lane" <[EMAIL PROTECTED]> writes:
> (In fact, I don't think the plan would change, in this case. The reason
> for the clamp to 1 row is to avoid foolish results for join situations.)
The screw case I've seen is when you have a large partitioned table where
constraint_exclusion fails to excl
"Nathan Boley" <[EMAIL PROTECTED]> writes:
>> (1) On what grounds do you assert the above?
> For a table with 100 non-mcv rows, the planner estimates a result
> set of cardinality 100/20 = 5, not 1.
The real problem in that situation is that you need another twenty slots
in the MCV li
>> Why? What if ( somehow ) we knew that there was a 90% chance that
>> query would return an empty result set on a big table with 20 non-mcv
>> distinct values. Currently the planner would always choose a seq scan,
>> where an index scan might be better.
>
> (1) On what grounds do you assert the
"Nathan Boley" <[EMAIL PROTECTED]> writes:
>>> If we query on values that aren't in the table, the planner will
>>> always overestimate the expected number of returned rows because it (
>>> implicitly ) assumes that every query will return at least 1 record.
>>
>> That's intentional and should not
>> If we query on values that aren't in the table, the planner will
>> always overestimate the expected number of returned rows because it (
>> implicitly ) assumes that every query will return at least 1 record.
>
> That's intentional and should not be changed.
Why? What if ( somehow ) we knew t
"Nathan Boley" <[EMAIL PROTECTED]> writes:
> If we query on values that aren't in the table, the planner will
> always overestimate the expected number of returned rows because it (
> implicitly ) assumes that every query will return at least 1 record.
That's intentional and should not be changed.
>> >> > One more problem with low ndistinct values is that the condition might
>> >> > very well
>> >> > hit no rows at all. But Idea 1 will largely overestimate the number of
>> >> > hits.
>>
>> Thats a good point, but I don't see a clear solution. Maybe we could
>
> I think that MCVs are the so
On Tue, 2008-06-10 at 08:51 -0700, Nathan Boley wrote:
> >> > One more problem with low ndistinct values is that the condition might
> >> > very well
> >> > hit no rows at all. But Idea 1 will largely overestimate the number of
> >> > hits.
>
> Thats a good point, but I don't see a clear solutio
>> > One more problem with low ndistinct values is that the condition might
>> > very well
>> > hit no rows at all. But Idea 1 will largely overestimate the number of
>> > hits.
Thats a good point, but I don't see a clear solution. Maybe we could
look at past queries
and keep track of how often
"Zeugswetter Andreas OSB sIT" <[EMAIL PROTECTED]> writes:
> I think for low ndistinct values we will want to know the exact
> value + counts and not a bin. So I think we will want additional stats rows
> that represent "value 'a1' stats".
Isn't that what our most frequent values list does?
--
> Obviously we run into problems when
> a) we have a poor estimate for ndistinct - but then we have
> worse problems
> b) our length measure doesn't correspond well with ndistinct
> in an interval
One more problem with low ndistinct values is that the condition might very well
hit no rows at all.
> Your argument seems to consider only columns having a normal
> distribution.
My example was based upon normally distributed data because people
usually know what they are and they are reasonably common.
> How badly does it fall apart for non-normal distributions?
This should work to the extent
On Sun, 2008-06-08 at 19:03 -0400, Tom Lane wrote:
> Your argument seems to consider only columns having a normal
> distribution. How badly does it fall apart for non-normal
> distributions? (For instance, Zipfian distributions seem to be pretty
> common in database work, from what I've seen.)
>
"Nathan Boley" <[EMAIL PROTECTED]> writes:
> ... There are two potential problems that I see with this approach:
> 1) It assumes the = is equivalent to <= and >= . This is certainly
> true for real numbers, but is it true for every equality relation that
> eqsel predicts for?
The cases that compu
Currently eqsel assumes that, except for the values stored as mcv's,
the number of times that a value appears in a table column is
independent of it's value. Unfortunately, this often yields
inappropriate selectivity estimates and frequently leads to
inappropriate plans.
As an example, consider an
22 matches
Mail list logo