Dne 20.1.2011 11:05, Csaba Nagy napsal(a):
> Hi Tomas,
>
> On Wed, 2011-01-19 at 23:13 +0100, Tomas Vondra wrote:
>> No, the multi-column statistics do not require constant updating. There
>> are cases where a sampling is perfectly fine, although you may need a
>> bit larger sample. Generally if y
Dne 20.1.2011 09:10, Heikki Linnakangas napsal(a):
> It seems that the suggested multi-column selectivity estimator would be
> more sensitive to ndistinct of the individual columns. Is that correct?
> How is it biased? If we routinely under-estimate ndistinct of individual
> columns, for example, d
Dne 20.1.2011 03:36, Robert Haas napsal(a):
> On Wed, Jan 19, 2011 at 5:13 PM, Tomas Vondra wrote:
Regarding the crash scenario - if the commit fails, just throw away the
local estimator copy, it's not needed. I'm not sure how to take care of
the case when commit succeeds and the wr
Dne 20.1.2011 03:06, Nathan Boley napsal(a):
>> And actually it does not depend on ndistinct for the columns only, it
>> depends on ndistinct estimates for the combination of columns. So
>> improving the ndistinct estimates for columns is just a necessary first
>> step (and only if it works reasona
Hi Tomas,
On Wed, 2011-01-19 at 23:13 +0100, Tomas Vondra wrote:
> No, the multi-column statistics do not require constant updating. There
> are cases where a sampling is perfectly fine, although you may need a
> bit larger sample. Generally if you can use a multi-dimensional
> histogram, you don'
On 20.01.2011 04:36, Robert Haas wrote:
... Even better, the
code changes would be confined to ANALYZE rather than spread out all
over the system, which has positive implications for robustness and
likelihood of commit.
Keep in mind that the administrator can already override the ndistinct
est
On Wed, Jan 19, 2011 at 6:35 PM, Florian Pflug wrote:
> On Jan20, 2011, at 02:41 , Nathan Boley wrote:
If you think about it, it's a bit ridiculous to look at the whole table
*just* to "estimate" ndistinct - if we go that far why dont we just
store the full distribution and be done
On Wed, Jan 19, 2011 at 9:35 PM, Florian Pflug wrote:
> Also, in my personal experience this isn't really the area we've got
> problems now. The problem cases for me always were queries with a rather
> large number of joins (7 to 10 or so) plus rather complex search
> conditions. The join order, n
On Wed, Jan 19, 2011 at 5:13 PM, Tomas Vondra wrote:
>>> Regarding the crash scenario - if the commit fails, just throw away the
>>> local estimator copy, it's not needed. I'm not sure how to take care of
>>> the case when commit succeeds and the write of the merged estimator
>>> fails, but I thin
On Jan20, 2011, at 02:41 , Nathan Boley wrote:
>>> If you think about it, it's a bit ridiculous to look at the whole table
>>> *just* to "estimate" ndistinct - if we go that far why dont we just
>>> store the full distribution and be done with it?
>>
>> - the best you could do is to average the
>>
>
> Not true. You're missing the goal of this effort - to get ndistinct
> estimate for combination of multiple columns. Which is usually
> pathological in case of dependent columns.
> Again, don't think about a single column (although even in that case
> there are known fail cases). Think about
>> If you think about it, it's a bit ridiculous to look at the whole table
>> *just* to "estimate" ndistinct - if we go that far why dont we just
>> store the full distribution and be done with it?
>
> - the best you could do is to average the
> individual probabilities which gives ... well, 1/ndis
On Jan19, 2011, at 23:44 , Nathan Boley wrote:
> If you think about it, it's a bit ridiculous to look at the whole table
> *just* to "estimate" ndistinct - if we go that far why dont we just
> store the full distribution and be done with it?
The crucial point that you're missing here is that ndist
Dne 19.1.2011 23:44, Nathan Boley napsal(a):
> 1) The distribution of values in a table is rarely pathological, and
> usually follows one of several common patterns. ( IOW, we have good
> heuristics )
Not true. You're missing the goal of this effort - to get ndistinct
estimate for combination of m
On Wed, Jan 19, 2011 at 2:13 PM, Tomas Vondra wrote:
> Dne 19.1.2011 20:25, Robert Haas napsal(a):
>> On Tue, Jan 18, 2011 at 5:16 PM, Tomas Vondra wrote:
>>> Yes, I was aware of this problem (amount of memory consumed with lots of
>>> updates), and I kind of hoped someone will come up with a rea
Dne 19.1.2011 20:46, Tom Lane napsal(a):
> Robert Haas writes:
>> ... I guess I'm just saying I'd think really, really hard
>> before abandoning the idea of periodic sampling. You have to get an
>> awful lot of benefit out of those cross-column stats to make it worth
>> paying a run-time cost for
Dne 19.1.2011 20:25, Robert Haas napsal(a):
> On Tue, Jan 18, 2011 at 5:16 PM, Tomas Vondra wrote:
>> Yes, I was aware of this problem (amount of memory consumed with lots of
>> updates), and I kind of hoped someone will come up with a reasonable
>> solution.
>
> As far as I can see, periodically
Robert Haas writes:
> ... I guess I'm just saying I'd think really, really hard
> before abandoning the idea of periodic sampling. You have to get an
> awful lot of benefit out of those cross-column stats to make it worth
> paying a run-time cost for them.
I've been trying to not discourage Toma
On Tue, Jan 18, 2011 at 5:16 PM, Tomas Vondra wrote:
> Yes, I was aware of this problem (amount of memory consumed with lots of
> updates), and I kind of hoped someone will come up with a reasonable
> solution.
As far as I can see, periodically sampling some or all of the table is
really the only
Dne 18.1.2011 18:12, Robert Haas napsal(a):
> On Tue, Jan 18, 2011 at 4:53 AM, wrote:
>> So the most important question is how to intercept the new/updated rows,
>> and where to store them. I think each backend should maintain it's own
>> private list of new records and forward them only in case
On Tue, Jan 18, 2011 at 12:32 PM, Jim Nasby wrote:
>> How's that different from what vacuum does on a TOAST table now?
>
> TOAST vacuum is currently an entirely separate vacuum. It might run at the
> same time as the main table vacuum, but it still has all the work that would
> be associated wit
On Jan 18, 2011, at 11:32 AM, Tom Lane wrote:
> Robert Haas writes:
>> On Tue, Jan 18, 2011 at 12:23 PM, Jim Nasby wrote:
>>> On Jan 17, 2011, at 8:11 PM, Robert Haas wrote:
>> On Mon, Jan 17, 2011 at 7:56 PM, Jim Nasby wrote:
- Forks are very possibly a more efficient way to deal with TOAS
On Jan 18, 2011, at 11:24 AM, Robert Haas wrote:
> On Tue, Jan 18, 2011 at 12:23 PM, Jim Nasby wrote:
>> On Jan 17, 2011, at 8:11 PM, Robert Haas wrote:
>>> On Mon, Jan 17, 2011 at 7:56 PM, Jim Nasby wrote:
- Forks are very possibly a more efficient way to deal with TOAST than
having s
Robert Haas writes:
> On Tue, Jan 18, 2011 at 12:23 PM, Jim Nasby wrote:
>> On Jan 17, 2011, at 8:11 PM, Robert Haas wrote:
> On Mon, Jan 17, 2011 at 7:56 PM, Jim Nasby wrote:
>>> - Forks are very possibly a more efficient way to deal with TOAST than
>>> having separate tables. There's a fair a
On Tue, Jan 18, 2011 at 12:23 PM, Jim Nasby wrote:
> On Jan 17, 2011, at 8:11 PM, Robert Haas wrote:
>> On Mon, Jan 17, 2011 at 7:56 PM, Jim Nasby wrote:
>>> - Forks are very possibly a more efficient way to deal with TOAST than
>>> having separate tables. There's a fair amount of overhead we pa
On Jan 17, 2011, at 8:11 PM, Robert Haas wrote:
> On Mon, Jan 17, 2011 at 7:56 PM, Jim Nasby wrote:
>> - Forks are very possibly a more efficient way to deal with TOAST than
>> having separate tables. There's a fair amount of overhead we pay for the
>> current setup.
>
> That seems like an inte
On Tue, Jan 18, 2011 at 4:53 AM, wrote:
> So the most important question is how to intercept the new/updated rows,
> and where to store them. I think each backend should maintain it's own
> private list of new records and forward them only in case of commit. Does
> that sound reasonable?
At the
> On Jan 17, 2011, at 6:36 PM, Tomas Vondra wrote:
>> 1) Forks are 'per relation' but the distinct estimators are 'per
>> column' (or 'per group of columns') so I'm not sure whether the file
>> should contain all the estimators for the table, or if there should
>> be one fork for each estimat
On Mon, Jan 17, 2011 at 7:56 PM, Jim Nasby wrote:
> - Forks are very possibly a more efficient way to deal with TOAST than having
> separate tables. There's a fair amount of overhead we pay for the current
> setup.
That seems like an interesting idea, but I actually don't see why it
would be an
On Jan 17, 2011, at 6:36 PM, Tomas Vondra wrote:
> 1) Forks are 'per relation' but the distinct estimators are 'per
> column' (or 'per group of columns') so I'm not sure whether the file
> should contain all the estimators for the table, or if there should
> be one fork for each estimator. Th
Dne 9.1.2011 13:58, Jim Nasby napsal(a):
>> A resource fork? Not sure what you mean, could you describe it in more
>> detail?
>
> Ooops, resource forks are a filesystem thing; we call them relation forks.
> >From src/backend/storage/smgr/README:
OK, I think using relation forks seems like a good
Hi,
a short update regarding the ndistinct stream estimators - I've
implemented the estimators described in the papers I've mentioned in my
previous posts. If you want try it, the sources are available at github,
at http://tvondra.github.com/pg_estimator (ignore the page, I have to
update it, just
On Fri, 2011-01-07 at 12:32 +0100, t...@fuzzy.cz wrote:
> the problem is you will eventually need to drop the results and rebuild
> it, as the algorithms do not handle deletes (ok, Florian mentioned an
> algorithm L_0 described in one of the papers, but I'm not sure we can use
> it).
Yes, but even
> On Fri, 2011-01-07 at 12:32 +0100, t...@fuzzy.cz wrote:
>> the problem is you will eventually need to drop the results and rebuild
>> it, as the algorithms do not handle deletes (ok, Florian mentioned an
>> algorithm L_0 described in one of the papers, but I'm not sure we can
>> use
>> it).
>
> Y
> A resource fork? Not sure what you mean, could you describe it in more
> detail?
Ooops, resource forks are a filesystem thing; we call them relation forks.
>From src/backend/storage/smgr/README:
Relation Forks
==
Since 8.4, a single smgr relation can be comprised of multiple physi
Dne 7.1.2011 20:56, Jim Nasby napsal(a):
> On Jan 7, 2011, at 5:32 AM, t...@fuzzy.cz wrote:
>> Another thing I'm not sure about is where to store those intermediate
>> stats (used to get the current estimate, updated incrementally). I was
>> thinking about pg_stats but I'm not sure it's the right p
On Jan 7, 2011, at 5:32 AM, t...@fuzzy.cz wrote:
> Another thing I'm not sure about is where to store those intermediate
> stats (used to get the current estimate, updated incrementally). I was
> thinking about pg_stats but I'm not sure it's the right place - depending
> on the algorithm, this may
> On Thu, 2010-12-30 at 21:02 -0500, Tom Lane wrote:
>> How is an incremental ANALYZE going to work at all?
>
> How about a kind of continuous analyze ?
>
> Instead of analyzing just once and then drop the intermediate results,
> keep them on disk for all tables and then piggyback the background
>
On Thu, 2010-12-30 at 21:02 -0500, Tom Lane wrote:
> How is an incremental ANALYZE going to work at all?
How about a kind of continuous analyze ?
Instead of analyzing just once and then drop the intermediate results,
keep them on disk for all tables and then piggyback the background
writer (or ha
On Dec 31, 2010, at 7:34 AM, Alvaro Herrera wrote:
> Excerpts from Tom Lane's message of jue dic 30 23:02:04 -0300 2010:
>> Alvaro Herrera writes:
>>> I was thinking that we could have two different ANALYZE modes, one
>>> "full" and one "incremental"; autovacuum could be modified to use one or
>>>
Excerpts from Tom Lane's message of jue dic 30 23:02:04 -0300 2010:
> Alvaro Herrera writes:
> > I was thinking that we could have two different ANALYZE modes, one
> > "full" and one "incremental"; autovacuum could be modified to use one or
> > the other depending on how many changes there are (of
Alvaro Herrera writes:
> I was thinking that we could have two different ANALYZE modes, one
> "full" and one "incremental"; autovacuum could be modified to use one or
> the other depending on how many changes there are (of course, the user
> could request one or the other, too; not sure what shoul
Excerpts from Tomas Vondra's message of jue dic 30 16:38:03 -0300 2010:
> > Since the need to regularly VACUUM tables hit by updated or deleted
> > won't go away any time soon, we could piggy-back the bit field
> > rebuilding onto VACUUM to avoid a second scan.
>
> Well, I guess it's a bit more c
Dne 30.12.2010 15:43, Florian Pflug napsal(a):
> On Dec27, 2010, at 23:49 , Kevin Grittner wrote:
>> Robert Haas wrote:
>>
>>> With respect to (b), I think I'd need to see a much more detailed
>>> design for how you intend to make this work. Off the top of my
>>> head there seems to be some prett
On Dec27, 2010, at 23:49 , Kevin Grittner wrote:
> Robert Haas wrote:
>
>> With respect to (b), I think I'd need to see a much more detailed
>> design for how you intend to make this work. Off the top of my
>> head there seems to be some pretty serious feasibility problems.
>
> I had one random
> Well, but that's not 7%, thats 7x! And the theorem says 'greater or equal'
> so this is actually the minimum - you can get a much bigger difference
> with lower probability. So you can easily get an estimate that is a few
> orders off.
FWIW, based on query performance, estimates which are up to
> wrote:
>
>> So even with 10% of the table, there's a 10% probability to get an
>> estimate that's 7x overestimated or underestimated. With lower
>> probability the interval is much wider.
>
> Hmmm... Currently I generally feel I'm doing OK when the estimated
> rows for a step are in the right o
wrote:
> So even with 10% of the table, there's a 10% probability to get an
> estimate that's 7x overestimated or underestimated. With lower
> probability the interval is much wider.
Hmmm... Currently I generally feel I'm doing OK when the estimated
rows for a step are in the right order of m
>
>> The simple truth is
>>
>> 1) sampling-based estimators are a dead-end
>
> The Charikar and Chaudhuri paper does not, in fact, say that it is
> impossible to improve sampling-based estimators as you claim it does. In
> fact, the authors offer several ways to improve sampling-based
> estimators.
On Tue, Dec 28, 2010 at 1:39 AM, Josh Berkus wrote:
> While I don't want to discourage you from working on steam-based
> estimators ... I'd love to see you implement a proof-of-concept for
> PostgreSQL, and test it ... the above is a non-argument. It requires us
> to accept that sample-based esti
> The simple truth is
>
> 1) sampling-based estimators are a dead-end
While I don't want to discourage you from working on steam-based
estimators ... I'd love to see you implement a proof-of-concept for
PostgreSQL, and test it ... the above is a non-argument. It requires us
to accept that sampl
Dne 28.12.2010 00:04, Kevin Grittner napsal(a):
> Tom Lane wrote:
>
>> Well, first, those scans occur only once every few hundred million
>> transactions, which is not likely a suitable timescale for
>> maintaining statistics.
>
> I was assuming that the pass of the entire table was priming fo
Dne 27.12.2010 22:46, Robert Haas napsal(a):
> 2010/12/27 Tomas Vondra :
>> But even though these disadvantages, there really is no other
>> way to enhance the estimates. I don't think this should be a
>> default behavior - just as in case of cross-column stats this should
>> be optional wh
Tom Lane wrote:
> Well, first, those scans occur only once every few hundred million
> transactions, which is not likely a suitable timescale for
> maintaining statistics.
I was assuming that the pass of the entire table was priming for the
incremental updates described at the start of this th
"Kevin Grittner" writes:
> Robert Haas wrote:
>> With respect to (b), I think I'd need to see a much more detailed
>> design for how you intend to make this work. Off the top of my
>> head there seems to be some pretty serious feasibility problems.
> I had one random thought on that -- it seem
Robert Haas wrote:
> With respect to (b), I think I'd need to see a much more detailed
> design for how you intend to make this work. Off the top of my
> head there seems to be some pretty serious feasibility problems.
I had one random thought on that -- it seemed like a large concern
was tha
2010/12/27 Tomas Vondra :
> But even though these disadvantages, there really is no other
> way to enhance the estimates. I don't think this should be a
> default behavior - just as in case of cross-column stats this should
> be optional when the current estimator does not work well.
This
57 matches
Mail list logo