Re: [PERFORM] limit clause breaks query planner?

2008-09-04 Thread Gregory Stark
Tom Lane <[EMAIL PROTECTED]> writes: > Guillaume Cottenceau <[EMAIL PROTECTED]> writes: >> It seems to me that if the correlation is 0.99[1], and you're >> looking for less than 1% of rows, the expected rows may be at the >> beginning or at the end of the heap? > > Right, but if you know the value

Re: [PERFORM] limit clause breaks query planner?

2008-09-04 Thread Matt Smiley
"Tom Lane" <[EMAIL PROTECTED]> writes: > I'm not sure offhand whether the existing correlation stats would be of use > for > it, or whether we'd have to get ANALYZE to gather additional data. Please forgive the tangent, but would it be practical to add support for gathering statistics on an arbi

Re: [PERFORM] limit clause breaks query planner?

2008-09-04 Thread Scott Carey
On Thu, Sep 4, 2008 at 10:14 AM, Tom Lane <[EMAIL PROTECTED]> wrote: > > Actually, an even easier hack (which would have the nice property of not > needing to know the exact value being searched for), would simply use > the existing cost estimates if the WHERE variables have low correlation > (mea

Re: [PERFORM] limit clause breaks query planner?

2008-09-04 Thread Tom Lane
Guillaume Cottenceau <[EMAIL PROTECTED]> writes: > It seems to me that if the correlation is 0.99[1], and you're > looking for less than 1% of rows, the expected rows may be at the > beginning or at the end of the heap? Right, but if you know the value being searched for, you could then estimate w

Re: [PERFORM] limit clause breaks query planner?

2008-09-04 Thread Matthew Wakeling
On Thu, 4 Sep 2008, Guillaume Cottenceau wrote: It seems to me that if the correlation is 0.99, and you're looking for less than 1% of rows, the expected rows may be at the beginning or at the end of the heap? Not necessarily. Imagine for example that you have a table with 1M rows, and one of

Re: [PERFORM] limit clause breaks query planner?

2008-09-04 Thread Guillaume Cottenceau
Matthew Wakeling writes: > On Thu, 4 Sep 2008, Tom Lane wrote: >> Ultimately the only way that we could get the right answer would be if >> the planner realized that the required rows are concentrated at the end >> of the table instead of being randomly scattered. This isn't something >> that is

Re: [PERFORM] limit clause breaks query planner?

2008-09-04 Thread Matthew Wakeling
On Thu, 4 Sep 2008, Tom Lane wrote: Ultimately the only way that we could get the right answer would be if the planner realized that the required rows are concentrated at the end of the table instead of being randomly scattered. This isn't something that is considered at all right now in seqscan

Re: [PERFORM] limit clause breaks query planner?

2008-09-04 Thread Tom Lane
"Matt Smiley" <[EMAIL PROTECTED]> writes: > "Tom Lane" <[EMAIL PROTECTED]> writes: >> default cost settings will cause it to prefer bitmap scan for retrieving >> up to about a third of the table, in my experience). I too am confused >> about why it doesn't prefer that choice in the OP's example.

Re: [PERFORM] limit clause breaks query planner?

2008-09-03 Thread Matt Smiley
"Tom Lane" <[EMAIL PROTECTED]> writes: > "Matt Smiley" <[EMAIL PROTECTED]> writes: > > So an Index Scan is always going to have a higher cost estimate than > > an equivalent Seq Scan returning the same result rows (unless > > random_page_cost is < 1). That's why I think the planner is alway

Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread Tom Lane
"Matt Smiley" <[EMAIL PROTECTED]> writes: > So an Index Scan is always going to have a higher cost estimate than > an equivalent Seq Scan returning the same result rows (unless > random_page_cost is < 1). That's why I think the planner is always > preferring the plan that uses a Seq Scan. If

Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread Matt Smiley
Hi David, Early in this thread, Pavel suggested: > you should partial index > > create index foo(b) on mytable where a is null; Rather, you might try the opposite partial index (where a is NOT null) as a replacement for the original unqualified index on column A. This new index will be ignor

Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread David West
eau [mailto:[EMAIL PROTECTED] Sent: 02 September 2008 14:56 To: David West Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] limit clause breaks query planner? "David West" writes: > INFO: "jbpm_taskinstance": moved 1374243 row versions, truncated 166156 to &g

Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread Guillaume Cottenceau
"David West" writes: > INFO: "jbpm_taskinstance": moved 1374243 row versions, truncated 166156 to > 140279 pages nothing which would explain so much planning off :/ > Yep, the table is from the jboss jbpm (business process management) schema. I've went to that kind of test then, but it didn't

Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread David West
>A single VACUUM may not report how bloated your table is, if it's >been VACUUM'ed some before, but not frequently enough. If you >have time for it, and you can afford a full lock on the table, >only a VACUUM FULL VERBOSE will tell you the previous bloat (the >"table .. truncated to .." line IIRC)

Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread David West
Here is the results of 'vacuum analyse verbose' on my table: INFO: vacuuming "public.jbpm_taskinstance" INFO: scanned index "jbpm_taskinstance_pkey" to remove 928153 row versions DETAIL: CPU 0.70s/2.40u sec elapsed 46.49 sec. INFO: scanned index "idx_tskinst_tminst" to remove 928153 row versio

Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread Guillaume Cottenceau
"David West" writes: > Yes I inserted values in big batches according to a single value of 'B', so > indeed a sequence scan may have to scan forward many millions of rows before > finding the required value. That may well be why the seqscan is so slow to give your results; that said, it doesn't

Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread Gregory Stark
"Limit (cost=0.00..3.68 rows=15 width=128) (actual time=85837.043..85896.140 rows=15 loops=1)" " -> Seq Scan on my_table this_ (cost=0.00..258789.88 rows=1055580 width=128) (actual time=85837.038..85896.091 rows=15 loops=1)" "Filter: ((A IS NULL) AND ((B)::t

Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread David West
t: 02 September 2008 10:07 To: David West; pgsql-performance@postgresql.org Subject: Re: [PERFORM] limit clause breaks query planner? Wouldn't this be e.g. normal if the distribution of values would be uneven, e.g. A IS NULL AND B = '21' not near the beginning of the table data? By the wa

Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread Guillaume Cottenceau
Russell Smith writes: > Pavel Stehule wrote: >> Hello >> >> 2008/9/1 David West <[EMAIL PROTECTED]>: >> >>> Thanks for your suggestion but the result is the same. >>> >>> Here is the explain analyse output from different queries. >>> Select * from my_table where A is null and B = '21' limit 15

Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread Pavel Stehule
2008/9/2 Guillaume Cottenceau <[EMAIL PROTECTED]>: > "Pavel Stehule" writes: > >> Hello >> >> 2008/9/1 David West <[EMAIL PROTECTED]>: >>> Thanks for your suggestion but the result is the same. >>> >>> Here is the explain analyse output from different queries. >>> Select * from my_table where A is

Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread Guillaume Cottenceau
"Pavel Stehule" writes: > Hello > > 2008/9/1 David West <[EMAIL PROTECTED]>: >> Thanks for your suggestion but the result is the same. >> >> Here is the explain analyse output from different queries. >> Select * from my_table where A is null and B = '21' limit 15 >> >> "Limit (cost=0.00..3.68 ro

Re: [PERFORM] limit clause breaks query planner?

2008-09-01 Thread Russell Smith
Pavel Stehule wrote: > Hello > > 2008/9/1 David West <[EMAIL PROTECTED]>: > >> Thanks for your suggestion but the result is the same. >> >> Here is the explain analyse output from different queries. >> Select * from my_table where A is null and B = '21' limit 15 >> >> "Limit (cost=0.00..3.68 ro

Re: [PERFORM] limit clause breaks query planner?

2008-09-01 Thread Pavel Stehule
70.772..7432.062 rows=1020062 loops=1)" > " Recheck Cond: ((B)::text = '21'::text)" > " Filter: (A IS NULL)" > " -> Bitmap Index Scan on idx_B (cost=0.00..20147.76 rows=1089958 width=0) > (actual time=466.545..466.545 rows=1020084 loops=1)"

Re: [PERFORM] limit clause breaks query planner?

2008-09-01 Thread David West
(actual time=466.545..466.545 rows=1020084 loops=1)" "Index Cond: ((B)::text = '21'::text)" "Total runtime: 8940.119 ms" In this case it goes for the correct index. It appears that the query planner makes very simplistic assumptions when it comes to LIMIT? T

Re: [PERFORM] limit clause breaks query planner?

2008-09-01 Thread Pavel Stehule
Hello you should partial index create index foo(b) on mytable where a is null; regards Pavel Stehule 2008/9/1 David West <[EMAIL PROTECTED]>: > Hi, > > > > I have a single table with about 10 million rows, and two indexes. Index A > is on a column A with 95% null values. Index B is on a colum

[PERFORM] limit clause breaks query planner?

2008-09-01 Thread David West
Hi, I have a single table with about 10 million rows, and two indexes. Index A is on a column A with 95% null values. Index B is on a column B with about 10 values, ie. About a million rows of each value. When I do a simple query on the table (no joins) with the following condition: A is