Tom Lane <[EMAIL PROTECTED]> writes:
> Guillaume Cottenceau <[EMAIL PROTECTED]> writes:
>> It seems to me that if the correlation is 0.99[1], and you're
>> looking for less than 1% of rows, the expected rows may be at the
>> beginning or at the end of the heap?
>
> Right, but if you know the value
"Tom Lane" <[EMAIL PROTECTED]> writes:
> I'm not sure offhand whether the existing correlation stats would be of use
> for
> it, or whether we'd have to get ANALYZE to gather additional data.
Please forgive the tangent, but would it be practical to add support for
gathering statistics on an arbi
On Thu, Sep 4, 2008 at 10:14 AM, Tom Lane <[EMAIL PROTECTED]> wrote:
>
> Actually, an even easier hack (which would have the nice property of not
> needing to know the exact value being searched for), would simply use
> the existing cost estimates if the WHERE variables have low correlation
> (mea
Guillaume Cottenceau <[EMAIL PROTECTED]> writes:
> It seems to me that if the correlation is 0.99[1], and you're
> looking for less than 1% of rows, the expected rows may be at the
> beginning or at the end of the heap?
Right, but if you know the value being searched for, you could then
estimate w
On Thu, 4 Sep 2008, Guillaume Cottenceau wrote:
It seems to me that if the correlation is 0.99, and you're
looking for less than 1% of rows, the expected rows may be at the
beginning or at the end of the heap?
Not necessarily. Imagine for example that you have a table with 1M rows,
and one of
Matthew Wakeling writes:
> On Thu, 4 Sep 2008, Tom Lane wrote:
>> Ultimately the only way that we could get the right answer would be if
>> the planner realized that the required rows are concentrated at the end
>> of the table instead of being randomly scattered. This isn't something
>> that is
On Thu, 4 Sep 2008, Tom Lane wrote:
Ultimately the only way that we could get the right answer would be if
the planner realized that the required rows are concentrated at the end
of the table instead of being randomly scattered. This isn't something
that is considered at all right now in seqscan
"Matt Smiley" <[EMAIL PROTECTED]> writes:
> "Tom Lane" <[EMAIL PROTECTED]> writes:
>> default cost settings will cause it to prefer bitmap scan for retrieving
>> up to about a third of the table, in my experience). I too am confused
>> about why it doesn't prefer that choice in the OP's example.
"Tom Lane" <[EMAIL PROTECTED]> writes:
> "Matt Smiley" <[EMAIL PROTECTED]> writes:
> > So an Index Scan is always going to have a higher cost estimate than
> > an equivalent Seq Scan returning the same result rows (unless
> > random_page_cost is < 1). That's why I think the planner is alway
"Matt Smiley" <[EMAIL PROTECTED]> writes:
> So an Index Scan is always going to have a higher cost estimate than
> an equivalent Seq Scan returning the same result rows (unless
> random_page_cost is < 1). That's why I think the planner is always
> preferring the plan that uses a Seq Scan.
If
Hi David,
Early in this thread, Pavel suggested:
> you should partial index
>
> create index foo(b) on mytable where a is null;
Rather, you might try the opposite partial index (where a is NOT null) as a
replacement for the original unqualified index on column A. This new index
will be ignor
eau [mailto:[EMAIL PROTECTED]
Sent: 02 September 2008 14:56
To: David West
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] limit clause breaks query planner?
"David West" writes:
> INFO: "jbpm_taskinstance": moved 1374243 row versions, truncated 166156
to
&g
"David West" writes:
> INFO: "jbpm_taskinstance": moved 1374243 row versions, truncated 166156 to
> 140279 pages
nothing which would explain so much planning off :/
> Yep, the table is from the jboss jbpm (business process management) schema.
I've went to that kind of test then, but it didn't
>A single VACUUM may not report how bloated your table is, if it's
>been VACUUM'ed some before, but not frequently enough. If you
>have time for it, and you can afford a full lock on the table,
>only a VACUUM FULL VERBOSE will tell you the previous bloat (the
>"table .. truncated to .." line IIRC)
Here is the results of 'vacuum analyse verbose' on my table:
INFO: vacuuming "public.jbpm_taskinstance"
INFO: scanned index "jbpm_taskinstance_pkey" to remove 928153 row versions
DETAIL: CPU 0.70s/2.40u sec elapsed 46.49 sec.
INFO: scanned index "idx_tskinst_tminst" to remove 928153 row versio
"David West" writes:
> Yes I inserted values in big batches according to a single value of 'B', so
> indeed a sequence scan may have to scan forward many millions of rows before
> finding the required value.
That may well be why the seqscan is so slow to give your results;
that said, it doesn't
"Limit (cost=0.00..3.68 rows=15 width=128) (actual
time=85837.043..85896.140 rows=15 loops=1)"
" -> Seq Scan on my_table this_ (cost=0.00..258789.88 rows=1055580
width=128) (actual time=85837.038..85896.091 rows=15 loops=1)"
"Filter: ((A IS NULL) AND ((B)::t
t: 02 September 2008 10:07
To: David West; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] limit clause breaks query planner?
Wouldn't this be e.g. normal if the distribution of values would
be uneven, e.g. A IS NULL AND B = '21' not near the beginning of
the table data?
By the wa
Russell Smith writes:
> Pavel Stehule wrote:
>> Hello
>>
>> 2008/9/1 David West <[EMAIL PROTECTED]>:
>>
>>> Thanks for your suggestion but the result is the same.
>>>
>>> Here is the explain analyse output from different queries.
>>> Select * from my_table where A is null and B = '21' limit 15
2008/9/2 Guillaume Cottenceau <[EMAIL PROTECTED]>:
> "Pavel Stehule" writes:
>
>> Hello
>>
>> 2008/9/1 David West <[EMAIL PROTECTED]>:
>>> Thanks for your suggestion but the result is the same.
>>>
>>> Here is the explain analyse output from different queries.
>>> Select * from my_table where A is
"Pavel Stehule" writes:
> Hello
>
> 2008/9/1 David West <[EMAIL PROTECTED]>:
>> Thanks for your suggestion but the result is the same.
>>
>> Here is the explain analyse output from different queries.
>> Select * from my_table where A is null and B = '21' limit 15
>>
>> "Limit (cost=0.00..3.68 ro
Pavel Stehule wrote:
> Hello
>
> 2008/9/1 David West <[EMAIL PROTECTED]>:
>
>> Thanks for your suggestion but the result is the same.
>>
>> Here is the explain analyse output from different queries.
>> Select * from my_table where A is null and B = '21' limit 15
>>
>> "Limit (cost=0.00..3.68 ro
70.772..7432.062 rows=1020062 loops=1)"
> " Recheck Cond: ((B)::text = '21'::text)"
> " Filter: (A IS NULL)"
> " -> Bitmap Index Scan on idx_B (cost=0.00..20147.76 rows=1089958 width=0)
> (actual time=466.545..466.545 rows=1020084 loops=1)"
(actual time=466.545..466.545 rows=1020084 loops=1)"
"Index Cond: ((B)::text = '21'::text)"
"Total runtime: 8940.119 ms"
In this case it goes for the correct index. It appears that the query planner
makes very simplistic assumptions when it comes to LIMIT?
T
Hello
you should partial index
create index foo(b) on mytable where a is null;
regards
Pavel Stehule
2008/9/1 David West <[EMAIL PROTECTED]>:
> Hi,
>
>
>
> I have a single table with about 10 million rows, and two indexes. Index A
> is on a column A with 95% null values. Index B is on a colum
Hi,
I have a single table with about 10 million rows, and two indexes. Index A
is on a column A with 95% null values. Index B is on a column B with about
10 values, ie. About a million rows of each value.
When I do a simple query on the table (no joins) with the following
condition:
A is
26 matches
Mail list logo