Re: [HACKERS] potential bug in trigger with boolean params

2011-05-11 Thread tv
Hi, I was trying to create a trigger with parameters. I've found a potential bug when the param is boolean. Here is code replicating the bug: CREATE TABLE x(x TEXT); CREATE OR REPLACE FUNCTION trigger_x() RETURNS TRIGGER AS $$ BEGIN RETURN NEW; END; $$ LANGUAGE PLPGSQL;

Re: [HACKERS] estimating # of distinct values

2011-01-18 Thread tv
On Jan 17, 2011, at 6:36 PM, Tomas Vondra wrote: 1) Forks are 'per relation' but the distinct estimators are 'per column' (or 'per group of columns') so I'm not sure whether the file should contain all the estimators for the table, or if there should be one fork for each estimator. The

Re: [HACKERS] estimating # of distinct values

2011-01-10 Thread tv
On Fri, 2011-01-07 at 12:32 +0100, t...@fuzzy.cz wrote: the problem is you will eventually need to drop the results and rebuild it, as the algorithms do not handle deletes (ok, Florian mentioned an algorithm L_0 described in one of the papers, but I'm not sure we can use it). Yes, but even

Re: [HACKERS] estimating # of distinct values

2011-01-07 Thread tv
On Thu, 2010-12-30 at 21:02 -0500, Tom Lane wrote: How is an incremental ANALYZE going to work at all? How about a kind of continuous analyze ? Instead of analyzing just once and then drop the intermediate results, keep them on disk for all tables and then piggyback the background writer

Re: [HACKERS] estimating # of distinct values

2010-12-28 Thread tv
The simple truth is 1) sampling-based estimators are a dead-end The Charikar and Chaudhuri paper does not, in fact, say that it is impossible to improve sampling-based estimators as you claim it does. In fact, the authors offer several ways to improve sampling-based estimators. Further,

Re: [HACKERS] estimating # of distinct values

2010-12-28 Thread tv
t...@fuzzy.cz wrote: So even with 10% of the table, there's a 10% probability to get an estimate that's 7x overestimated or underestimated. With lower probability the interval is much wider. Hmmm... Currently I generally feel I'm doing OK when the estimated rows for a step are in the

Re: [HACKERS] proposal : cross-column stats

2010-12-24 Thread tv
2010/12/24 Florian Pflug f...@phlo.org: On Dec23, 2010, at 20:39 , Tomas Vondra wrote:   I guess we could use the highest possible value (equal to the number   of tuples) - according to wiki you need about 10 bits per element   with 1% error, i.e. about 10MB of memory for each million of  

Re: [HACKERS] proposal : cross-column stats

2010-12-21 Thread tv
On Dec18, 2010, at 17:59 , Tomas Vondra wrote: It seems to me you're missing one very important thing - this was not meant as a new default way to do estimates. It was meant as an option when the user (DBA, developer, ...) realizes the current solution gives really bad estimates (due to

Re: [HACKERS] proposal : cross-column stats

2010-12-21 Thread tv
On Mon, Dec 20, 2010 at 9:29 PM, Florian Pflug f...@phlo.org wrote: You might use that to decide if either A-B or B-a looks function-like enough to use the uniform bayesian approach. Or you might even go further, and decide *with* bayesian formula to use - the paper you cited always averages

Re: [HACKERS] proposal : cross-column stats

2010-12-21 Thread tv
On Dec21, 2010, at 11:37 , t...@fuzzy.cz wrote: I doubt there is a way to this decision with just dist(A), dist(B) and dist(A,B) values. Well, we could go with a rule if [dist(A) == dist(A,B)] the [A = B] but that's very fragile. Think about estimates (we're not going to work with exact

Re: [HACKERS] proposal : cross-column stats

2010-12-21 Thread tv
On Dec21, 2010, at 15:51 , t...@fuzzy.cz wrote: This is the reason why they choose to always combine the values (with varying weights). There are no varying weights involved there. What they do is to express P(A=x,B=y) once as ... P(A=x,B=y) ~= P(B=y|A=x)*P(A=x)/2 + P(A=x|B=y)*P(B=y)/2

Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)

2010-12-19 Thread tv
Tomas Vondra t...@fuzzy.cz writes: I've done several small changes to the patch, namely - added docs for the functions (in SGML) - added the same thing for background writer So I think now it's 'complete' and I'll add it to the commit fest in a few minutes. Please split this into

Re: [HACKERS] proposal : cross-column stats

2010-12-17 Thread tv
On Dec17, 2010, at 23:12 , Tomas Vondra wrote: Well, not really - I haven't done any experiments with it. For two columns selectivity equation is (dist(A) * sel(A) + dist(B) * sel(B)) / (2 * dist(A,B)) where A and B are columns, dist(X) is number of distinct values in column X and

Re: [HACKERS] proposal : cross-column stats

2010-12-13 Thread tv
On 2010-12-13 03:28, Robert Haas wrote: Well, I'm not real familiar with contingency tables, but it seems like you could end up needing to store a huge amount of data to get any benefit out of it, in some cases. For example, in the United States, there are over 40,000 postal codes, and some

Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread tv
On Sun, Dec 12, 2010 at 9:16 PM, Tomas Vondra t...@fuzzy.cz wrote: Dne 13.12.2010 03:00, Robert Haas napsal(a): Well, the question is what data you are actually storing.  It's appealing to store a measure of the extent to which a constraint on column X constrains column Y, because you'd only

[HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)

2010-12-11 Thread tv
Hi everyone, I just wrote my first patch, and I need to know whether I missed something or not. I haven't used C for a really long time, so sickbags on standby, and if you notice something really stupid don't hesitate to call me an asshole (according to Simon Phipps that proves we are a healthy

Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)

2010-12-11 Thread tv
Hello you have to respect pg coding style: a) not too long lines b) not C++ line comments OK, thanks for the notice. I've fixed those two problems. regards Tomasdiff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 346eaaf..0ee59b1 100644 ---

Re: [HACKERS] [GENERAL] Postgres 9.1 - Release Theme

2010-04-01 Thread tv
Following a great deal of discussion, I'm pleased to announce that the PostgreSQL Core team has decided that the major theme for the 9.1 release, due in 2011, will be 'NoSQL'. Please, provide me your address so I can forward you the health care bills I had to pay due to the heart attack I