[PERFORM] LIMIT confuses the planner

2009-02-23 Thread Kouber Saparev
Hello, I'm experiencing a strange issue. I have a table with around 11 million records (11471762 to be exact), storing login attempts to a web site. Thanks to the index I have created on username, looking into that table by username is very fast: db=# EXPLAIN ANALYZE SELECT * FROM logi

Re: [PERFORM] LIMIT confuses the planner

2009-02-23 Thread Richard Huxton
Kouber Saparev wrote: > db=# EXPLAIN ANALYZE > SELECT > * > FROM > login_attempt > WHERE > username='kouber' > ORDER BY > login_attempt_sid DESC; > > QUERY PLAN > --

Re: [PERFORM] LIMIT confuses the planner

2009-02-23 Thread Robert Haas
On Mon, Feb 23, 2009 at 7:26 AM, Kouber Saparev wrote: > Now, recently I have altered some of the default parameters in order to get > as much as possible out of the hardware - 12 GB of RAM, 8 processors. So, I > guess I have done something wrong, thus the planner is taking that wrong > decision.

Re: [PERFORM] LIMIT confuses the planner

2009-02-23 Thread Tom Lane
Robert Haas writes: > If you left seq_page_cost (which isn't mentioned here) at the default > value but reduced random_page_cost to 0.1, then you have > random_page_cost < seq_page_cost. That's probably Bad. ... well, it's certainly going to push the planner to believe indexscans are cheaper tha

Re: [PERFORM] LIMIT confuses the planner

2009-02-23 Thread Kouber Saparev
Richard Huxton wrote: Since it's expecting 7914 rows for "kouber" it thinks it will find the 20 rows you want fairly quickly by just looking backward through the login_attempt_pkey index. Try increasing the stats on the username column. ALTER TABLE login_attempt ALTER COLUMN username SET STATIS

Re: [PERFORM] LIMIT confuses the planner

2009-02-23 Thread Tom Lane
Kouber Saparev writes: > Now the planner believes there're 910 rows, which is a bit closer to the > real data: > swing=# select avg(length) from (select username, count(*) as length > from login_attempt group by username) as freq; > avg > -- > 491.608731042755547

Re: [PERFORM] LIMIT confuses the planner

2009-02-23 Thread Kouber Saparev
Tom Lane wrote: Robert Haas writes: If you left seq_page_cost (which isn't mentioned here) at the default value but reduced random_page_cost to 0.1, then you have random_page_cost < seq_page_cost. That's probably Bad. ... well, it's certainly going to push the planner to believe indexscans a

Re: [PERFORM] LIMIT confuses the planner

2009-02-24 Thread Kouber Saparev
Tom Lane wrote: Kouber Saparev writes: Now the planner believes there're 910 rows, which is a bit closer to the real data: swing=# select avg(length) from (select username, count(*) as length from login_attempt group by username) as freq; avg -- 491.608731042

Re: [PERFORM] LIMIT confuses the planner

2009-03-22 Thread Tom Lane
Kouber Saparev writes: > Tom Lane wrote: >> Hmph, that's still not real good. Ideally it should be estimating >> *less* than the average frequency, because the estimate is made after >> excluding all the most-common-values, which evidently 'kouber' is not >> one of. > I altered the statistics fo

Re: [PERFORM] LIMIT confuses the planner

2009-03-22 Thread marcin mank
> So the bottom line here is just that the estimated n_distinct is too > low.  We've seen before that the equation we use tends to do that more > often than not.  I doubt that consistently erring on the high side would > be better though :-(.  Estimating n_distinct from a limited sample of > the po

Re: [PERFORM] LIMIT confuses the planner

2009-03-22 Thread marcin mank
> I hit an interestinhg paper on n_distinct calculation: > > http://www.pittsburgh.intel-research.net/people/gibbons/papers/distinct-values-chapter.pdf > > the PCSA algorithm described there requires O(1) calculation per > value. Page 22 describes what to do with updates streams. > > This I think (

Re: [PERFORM] LIMIT confuses the planner

2009-03-22 Thread Tom Lane
marcin mank writes: > I hit an interestinhg paper on n_distinct calculation: > http://www.pittsburgh.intel-research.net/people/gibbons/papers/distinct-values-chapter.pdf I don't think we're quite ready to make ANALYZE read every row of a table in order to estimate n_distinct. It is an interestin

Re: [PERFORM] LIMIT confuses the planner

2009-03-24 Thread Kouber Saparev
Now I am experiencing similar issue with another table, called "message", for which there's a conditional index: CREATE TABLE message ( message_sid SERIAL PRIMARY KEY, from_profile_sid INT NOT NULL REFERENCES profile, to_profile_sid INT NOT NULL REFERENCES profile, sender_has_deleted BOO

[PERFORM] LIMIT confuses the planner (again)

2009-09-28 Thread Kouber Saparev
Hello, I am using PostgreSQL 8.3.7 and I am experiencing an issue similar to the one I've already described some time ago: http://archives.postgresql.org/pgsql-performance/2009-02/msg00261.php Again, adding a LIMIT clause to a query, which is normally executing very fast thanks to an index, m

Re: [PERFORM] LIMIT confuses the planner (again)

2009-09-28 Thread Robert Haas
On Mon, Sep 28, 2009 at 4:43 AM, Kouber Saparev wrote: > Hello, > > I am using PostgreSQL 8.3.7 and I am experiencing an issue similar to the > one I've already described some time ago: > http://archives.postgresql.org/pgsql-performance/2009-02/msg00261.php > > Again, adding a LIMIT clause to a qu