Re: [GENERAL] select distinct and index usage

2008-04-08 Thread Martijn van Oosterhout
On Sun, Apr 06, 2008 at 07:32:58PM -0400, David Wilson wrote: I have a reasonably large table (~75m rows,~18gb) called vals. It includes an integer datestamp column with approximately 4000 unique entries across the rows; there is a normal btree index on the datestamp column. When I attempt

Re: [GENERAL] select distinct and index usage

2008-04-08 Thread David Wilson
On Mon, Apr 7, 2008 at 9:11 PM, Stephen Denne [EMAIL PROTECTED] wrote: You may be able to make use of an index by rearranging your query to generate a series between your min max values, testing whether each value is in your table. You've got 4252 distinct values, but what is the range

Re: [GENERAL] select distinct and index usage

2008-04-08 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes: Alvaro Herrera [EMAIL PROTECTED] writes: Tom Lane escribió: What I think you'll find, though, is that once you do force an indexscan to be picked it'll be slower. Full-table index scans are typically worse than seqscan+sort, unintuitive though that may

Re: [GENERAL] select distinct and index usage

2008-04-08 Thread Alvaro Herrera
Gregory Stark escribió: I thought our DISTINCT index scan does do that but it still has to read the index leaf pages sequentially. It doesn't back-track up the tree structure and refind the next key. The way to back-track is to start the scan over from the root page down, keeping a stack of

Re: [GENERAL] select distinct and index usage

2008-04-07 Thread Alban Hertroys
On Apr 7, 2008, at 1:32 AM, David Wilson wrote: I have a reasonably large table (~75m rows,~18gb) called vals. It includes an integer datestamp column with approximately 4000 unique entries across the rows; there is a normal btree index on the datestamp column. When I attempt something like

Re: [GENERAL] select distinct and index usage

2008-04-07 Thread David Wilson
On Mon, Apr 7, 2008 at 2:05 AM, Alban Hertroys [EMAIL PROTECTED] wrote: On Apr 7, 2008, at 1:32 AM, David Wilson wrote: The databases estimates seem consistent with yours, so why is it doing this? Could you provide an EXPLAIN ANALYSE? It shows the actual numbers next to the estimates,

Re: [GENERAL] select distinct and index usage

2008-04-07 Thread Alban Hertroys
On Apr 7, 2008, at 9:47 AM, David Wilson wrote: On Mon, Apr 7, 2008 at 2:05 AM, Alban Hertroys [EMAIL PROTECTED] wrote: The databases estimates seem consistent with yours, so why is it doing this? Could you provide an EXPLAIN ANALYSE? It shows the actual numbers next to the estimates,

Re: [GENERAL] select distinct and index usage

2008-04-07 Thread Joshua D. Drake
On Mon, 7 Apr 2008 19:42:02 +0200 Alban Hertroys [EMAIL PROTECTED] wrote: explain analyze select distinct datestamp from vals; QUERY PLAN --

Re: [GENERAL] select distinct and index usage

2008-04-07 Thread David Wilson
On Mon, Apr 7, 2008 at 1:42 PM, Alban Hertroys [EMAIL PROTECTED] wrote: Have you tried this query with enable_seqscan=off? If my guess is right (and the planners, in that case) it'd be even slower. set enable_seqscan=off; explain select distinct datestamp from vals;

Re: [GENERAL] select distinct and index usage

2008-04-07 Thread Gregory Stark
David Wilson [EMAIL PROTECTED] writes: I appreciate the responses so far! I'm used to several minutes for some of the complex queries on this DB, but 12.5 minutes for a select distinct just seems wrong. :) You could try changing it to the equivalent GROUP BY query. The planner, unfortunately,

Re: [GENERAL] select distinct and index usage

2008-04-07 Thread David Wilson
On Mon, Apr 7, 2008 at 7:57 PM, Gregory Stark [EMAIL PROTECTED] wrote: You could try changing it to the equivalent GROUP BY query. The planner, unfortunately, doesn't know they're equivalent and has two separate sets of plans available. In this case where there are only 4,000 distinct

Re: [GENERAL] select distinct and index usage

2008-04-07 Thread Alvaro Herrera
David Wilson escribió: explain analyze select datestamp from vals group by datestamp; QUERY PLAN

Re: [GENERAL] select distinct and index usage

2008-04-07 Thread Stephen Denne
Still doing the sequential scan on the table, but at least it's avoiding the expensive disk merge sort. It still seems as if I ought to be able to coax it into using an index for this type of query, though- especially since it's using one on the other table. Is there perhaps some way to

Re: [GENERAL] select distinct and index usage

2008-04-07 Thread Tom Lane
David Wilson [EMAIL PROTECTED] writes: It appears to be doing a sequential scan regardless of the set, as if it doesn't believe it can use the index for some reason More likely, it's getting a cost estimate for the indexscan that's so bad that it even exceeds the 1-unit thumb on the

Re: [GENERAL] select distinct and index usage

2008-04-07 Thread Alvaro Herrera
Tom Lane escribió: What I think you'll find, though, is that once you do force an indexscan to be picked it'll be slower. Full-table index scans are typically worse than seqscan+sort, unintuitive though that may sound. Hmm, should we switch the CLUSTER code to do that? -- Alvaro Herrera

Re: [GENERAL] select distinct and index usage

2008-04-07 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes: Tom Lane escribió: What I think you'll find, though, is that once you do force an indexscan to be picked it'll be slower. Full-table index scans are typically worse than seqscan+sort, unintuitive though that may sound. Hmm, should we switch the

Re: [GENERAL] select distinct and index usage

2008-04-07 Thread Stephen Denne
Alban Hertroys wrote Something that might help you, but I'm not sure whether it might hurt the performance of other queries, is to cluster that table on val_datestamp_idx. That way the records are already (mostly) sorted on disk in the order of the datestamps, which seems to be the

[GENERAL] select distinct and index usage

2008-04-06 Thread David Wilson
I have a reasonably large table (~75m rows,~18gb) called vals. It includes an integer datestamp column with approximately 4000 unique entries across the rows; there is a normal btree index on the datestamp column. When I attempt something like select distinct datestamp from vals, however, explain

Re: [GENERAL] select distinct and index usage

2008-04-06 Thread Craig Ringer
David Wilson wrote: On another table in the same database with a much smaller number of total rows (~15m rows), I have the exact same situation- but in this case the index on the datestamp column *is* used: Have you run ANALYZE on both tables? It might be worth increasing the stats

Re: [GENERAL] select distinct and index usage

2008-04-06 Thread David Wilson
Both tables are vacuumed and analyzed. I have other queries that are using various indices on the vals table in an intelligent fashion. I can try increasing the stats, certainly, although they're at the defaults for both tables. The variation is definitely identical- the set of datestamps in the