Re: [PERFORM] anti-join with small table via text/varchar cannot estimate rows correctly

2017-03-01 Thread Stefan Andreatta
On 02.03.2017 02:06, Tom Lane wrote: Stefan Andreatta writes: The same anti-join using the text fields, however estimates just 1 resulting row, while there are still of course 9,999 of them: =# explain analyze select tmp_san_1.id from tmp_san_1 left join tmp_san_2 on

[PERFORM] anti-join with small table via text/varchar cannot estimate rows correctly

2017-03-01 Thread Stefan Andreatta
Hello, I have encountered a strange problem when doing an anti-join with a very small table via a varchar or text field as opposed to an integer field. Postgres version is 9.5.3 I did some experiments to extract the problem in a simple form. FIrst generate two tables with a series of numbers

Re: [PERFORM] serious under-estimation of n_distinct for clustered distributions

2013-01-13 Thread Stefan Andreatta
n our databases. And considering the commonplace conditions leading to it, I would expect many systems to be affected. But searching the forums and the web I hardly found any references to it - which amazes me to no end. Best Regards, Stefan On 12/30/2012 07:02 PM, Stefan Andreatta wrote: On

Re: [PERFORM] serious under-estimation of n_distinct for clustered distributions

2013-01-03 Thread Stefan Andreatta
On 12/29/2012 10:57 PM, Peter Geoghegan wrote: On 29 December 2012 20:57, Stefan Andreatta wrote: ... The general advice here is: 1) Increase default_statistics_target for the column. I tried that, but to get good estimates under these circumstances, I need to set the statistics_target

Re: [PERFORM] Simple join doesn't use index

2013-01-03 Thread Stefan Andreatta
On 01/03/2013 11:54 PM, Alex Vinnik wrote: Don't understand why PG doesn't use views_visit_id_index in that query but rather scans whole table. One explanation I have found that when resulting dataset constitutes ~15% of total number of rows in the table then seq scan is used. In this case result

Re: [PERFORM] serious under-estimation of n_distinct for clustered distributions

2012-12-30 Thread Stefan Andreatta
On 12/29/2012 10:57 PM, Peter Geoghegan wrote: On 29 December 2012 20:57, Stefan Andreatta wrote: Now, the 2005 discussion goes into great detail on the advantages and disadvantages of this algorithm, particularly when using small sample sizes, and several alternatives are discussed. I do not

[PERFORM] serious under-estimation of n_distinct for clustered distributions

2012-12-29 Thread Stefan Andreatta
I have encountered serious under-estimations of distinct values when values are not evenly distributed but clustered within a column. I think this problem might be relevant to many real-world use cases and I wonder if there is a good workaround or possibly a programmatic solution that could be