Re: [PERFORM] Performance of count(*)

2007-03-23 Thread Tino Wildenhain
Michael Stone schrieb: On Fri, Mar 23, 2007 at 01:01:02PM +0100, Tino Wildenhain wrote: This discussion is a bit theoretical until we see the actual problem and the proposed solution here. It's good to see you back off a bit from your previous stance of assuming that someone doesn't know what

Re: [PERFORM] Performance of count(*)

2007-03-23 Thread Michael Stone
On Fri, Mar 23, 2007 at 01:01:02PM +0100, Tino Wildenhain wrote: This discussion is a bit theoretical until we see the actual problem and the proposed solution here. It's good to see you back off a bit from your previous stance of assuming that someone doesn't know what they're doing and that

Re: [PERFORM] Performance of count(*)

2007-03-23 Thread Tino Wildenhain
Michael Stone schrieb: On Thu, Mar 22, 2007 at 06:27:32PM +0100, Tino Wildenhain wrote: Craig A. James schrieb: You guys can correct me if I'm wrong, but the key feature that's missing from Postgres's flexible indexing is the ability to maintain state across queries. Something like this: s

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Tino Wildenhain
Craig A. James schrieb: Tino Wildenhain wrote: You guys can correct me if I'm wrong, but the key feature that's missing from Postgres's flexible indexing is the ability to maintain state across queries. Something like this: select a, b, my_index_state() from foo where ... offset 100 limi

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Tino Wildenhain
Michael Stone schrieb: On Thu, Mar 22, 2007 at 02:24:39PM -0400, Merlin Moncure wrote: Tino was saying that rather that build a complete indexing storage management solution that lives outside the database, it is better to do intelligent session management so that you get the simplicity if a two

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Tom Lane
"Craig A. James" <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> You mean >> http://archives.postgresql.org/pgsql-performance/2006-10/msg00283.php >> ? I don't see anything there that bears on Steve's suggestion. > Mea culpa, it's October 8, not October 18: >http://archives.postgresql.org/pg

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Craig A. James
Tom Lane wrote: "Craig A. James" <[EMAIL PROTECTED]> writes: Steve Atkins wrote: As long as you're ordering by some row in the table then you can do that in straight SQL. select a, b, ts from foo where (stuff) and foo > X order by foo limit 10 Then, record the last value of foo you read, and

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Michael Stone
On Thu, Mar 22, 2007 at 02:24:39PM -0400, Merlin Moncure wrote: Tino was saying that rather that build a complete indexing storage management solution that lives outside the database, it is better to do intelligent session management so that you get the simplicity if a two tier client server syst

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Tom Lane
"Craig A. James" <[EMAIL PROTECTED]> writes: > Steve Atkins wrote: >> As long as you're ordering by some row in the table then you can do that in >> straight SQL. >> >> select a, b, ts from foo where (stuff) and foo > X order by foo limit 10 >> >> Then, record the last value of foo you read, and

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Steve Atkins
On Mar 22, 2007, at 11:26 AM, Guido Neitzer wrote: On 22.03.2007, at 11:53, Steve Atkins wrote: As long as you're ordering by some row in the table then you can do that in straight SQL. select a, b, ts from foo where (stuff) and foo > X order by foo limit 10 Then, record the last value

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Guido Neitzer
On 22.03.2007, at 11:53, Steve Atkins wrote: As long as you're ordering by some row in the table then you can do that in straight SQL. select a, b, ts from foo where (stuff) and foo > X order by foo limit 10 Then, record the last value of foo you read, and plug it in as X the next time

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Merlin Moncure
On 3/22/07, Michael Stone <[EMAIL PROTECTED]> wrote: On Thu, Mar 22, 2007 at 06:27:32PM +0100, Tino Wildenhain wrote: >Craig A. James schrieb: >>You guys can correct me if I'm wrong, but the key feature that's missing >>from Postgres's flexible indexing is the ability to maintain state >>across q

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Craig A. James
Steve Atkins wrote: As long as you're ordering by some row in the table then you can do that in straight SQL. select a, b, ts from foo where (stuff) and foo > X order by foo limit 10 Then, record the last value of foo you read, and plug it in as X the next time around. We've been over this be

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Steve Atkins
On Mar 22, 2007, at 10:21 AM, Craig A. James wrote: Tino Wildenhain wrote: Craig A. James schrieb: ... In our case (for a variety of reasons, but this one is critical), we actually can't use Postgres indexing at all -- we wrote an entirely separate indexing system for our data... ...Ther

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Craig A. James
Tino Wildenhain wrote: You guys can correct me if I'm wrong, but the key feature that's missing from Postgres's flexible indexing is the ability to maintain state across queries. Something like this: select a, b, my_index_state() from foo where ... offset 100 limit 10 using my_index(prev_

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Michael Stone
On Thu, Mar 22, 2007 at 06:27:32PM +0100, Tino Wildenhain wrote: Craig A. James schrieb: You guys can correct me if I'm wrong, but the key feature that's missing from Postgres's flexible indexing is the ability to maintain state across queries. Something like this: select a, b, my_index_sta

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Tino Wildenhain
Craig A. James schrieb: Tino Wildenhain wrote: Craig A. James schrieb: ... In our case (for a variety of reasons, but this one is critical), we actually can't use Postgres indexing at all -- we wrote an entirely separate indexing system for our data... ...There is no need to store or maintai

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Craig A. James
Tino Wildenhain wrote: Craig A. James schrieb: ... In our case (for a variety of reasons, but this one is critical), we actually can't use Postgres indexing at all -- we wrote an entirely separate indexing system for our data... ...There is no need to store or maintain this information along

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Craig A. James
Brian Hurt wrote: One of our biggest single problems is this very thing. It's not a Postgres problem specifically, but more embedded in the idea of a relational database: There are no "job status" or "rough estimate of results" or "give me part of the answer" features that are critical to man

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Mario Weilguni
Am Donnerstag, 22. März 2007 16:17 schrieb Andreas Kostyrka: > * Mario Weilguni <[EMAIL PROTECTED]> [070322 15:59]: > > Am Donnerstag, 22. März 2007 15:33 schrieb Jonah H. Harris: > > > On 3/22/07, Merlin Moncure <[EMAIL PROTECTED]> wrote: > > > > As others suggest select count(*) from table is ver

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Brian Hurt
Craig A. James wrote: One of our biggest single problems is this very thing. It's not a Postgres problem specifically, but more embedded in the idea of a relational database: There are no "job status" or "rough estimate of results" or "give me part of the answer" features that are critical

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Carlos Moreno
count(*). I was just discussing general performance issues on the phone line and when my colleague asked me about the size of the database he just wonderd why this takes so long for a job his MS-SQL server is much faster. [...]. Simple. MSSQL is optimized for this case, and uses "older"

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Tino Wildenhain
Craig A. James schrieb: ... In our case (for a variety of reasons, but this one is critical), we actually can't use Postgres indexing at all -- we wrote an entirely separate indexing system for our data, one that has the following properties: 1. It can give out "pages" of information (i.e. "

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Craig A. James
Michael Stone wrote: On Thu, Mar 22, 2007 at 01:30:35PM +0200, [EMAIL PROTECTED] wrote: approximated count? why? who would need it? where you can use it? Do a google query. Look at the top of the page, where it says "results N to M of about O". For user interfaces (which is where a lot o

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Andreas Kostyrka
* Mario Weilguni <[EMAIL PROTECTED]> [070322 15:59]: > Am Donnerstag, 22. März 2007 15:33 schrieb Jonah H. Harris: > > On 3/22/07, Merlin Moncure <[EMAIL PROTECTED]> wrote: > > > As others suggest select count(*) from table is very special case > > > which non-mvcc databases can optimize for. > > >

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Luke Lonergan
Andreas, On 3/22/07 4:48 AM, "Andreas Tille" <[EMAIL PROTECTED]> wrote: > Well, to be honest I'm not really interested in the performance of > count(*). I was just discussing general performance issues on the > phone line and when my colleague asked me about the size of the > database he just wo

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread mark
On Thu, Mar 22, 2007 at 10:18:10AM -0400, Michael Stone wrote: > IIRC, that's basically what you get with the mysql count anyway, since > there are corner cases for results in a transaction. Avoiding those > cases is why the postgres count takes so long; sometimes that's what's > desired and som

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Michael Stone
On Thu, Mar 22, 2007 at 09:39:18AM -0400, Merlin Moncure wrote: You can get the approximate count by selecting reltuples from pg_class. It is valid as of last analyze. Of course, that only works if you're not using any WHERE clause. Here's a (somewhat ugly) example of getting an approximate c

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Michael Fuhr
On Thu, Mar 22, 2007 at 01:29:46PM +0100, Andreas Kostyrka wrote: > * Andreas Tille <[EMAIL PROTECTED]> [070322 13:24]: > > Well, to be honest I'm not really interested in the performance of > > count(*). I was just discussing general performance issues on the > > phone line and when my colleague

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Mario Weilguni
Am Donnerstag, 22. März 2007 15:33 schrieb Jonah H. Harris: > On 3/22/07, Merlin Moncure <[EMAIL PROTECTED]> wrote: > > As others suggest select count(*) from table is very special case > > which non-mvcc databases can optimize for. > > Well, other MVCC database still do it faster than we do. Howe

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Jonah H. Harris
On 3/22/07, Merlin Moncure <[EMAIL PROTECTED]> wrote: As others suggest select count(*) from table is very special case which non-mvcc databases can optimize for. Well, other MVCC database still do it faster than we do. However, I think we'll be able to use the dead space map for speeding this

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Michael Stone
On Thu, Mar 22, 2007 at 01:30:35PM +0200, [EMAIL PROTECTED] wrote: approximated count? why? who would need it? where you can use it? Do a google query. Look at the top of the page, where it says "results N to M of about O". For user interfaces (which is where a lot of this count(*) stuff

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Merlin Moncure
On 3/22/07, Andreas Tille <[EMAIL PROTECTED]> wrote: I just try to find out why a simple count(*) might last that long. At first I tried explain, which rather quickly knows how many rows to check, but the final count is two orders of magnitude slower. You can get the approximate count by select

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Mario Weilguni
Am Donnerstag, 22. März 2007 12:30 schrieb [EMAIL PROTECTED]: > approximated count? > > why? who would need it? where you can use it? > > calculating costs and desiding how to execute query needs > approximated count, but it's totally worthless information for any user > IMO. No, it is not use

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Bill Moran
In response to [EMAIL PROTECTED]: > > approximated count? > > why? who would need it? where you can use it? > > calculating costs and desiding how to execute query needs > approximated count, but it's totally worthless information for any user > IMO. I don't think so. We have some AJAX s

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Andreas Kostyrka
* Andreas Tille <[EMAIL PROTECTED]> [070322 13:24]: > On Thu, 22 Mar 2007, Andreas Kostyrka wrote: > > >Which version of PG? > > Ahh, sorry, forgot that. The issue occurs in Debian (Etch) packaged > version 7.4.16. I plan to switch soon to 8.1.8. I'd recommend 8.2 if at all possible :) > > >Th

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Andreas Tille
On Thu, 22 Mar 2007, Andreas Kostyrka wrote: Which version of PG? Ahh, sorry, forgot that. The issue occurs in Debian (Etch) packaged version 7.4.16. I plan to switch soon to 8.1.8. That's the reason why PG (check the newest releases, I seem to remember that there has been some aggregate o

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread ismo . tuononen
approximated count? why? who would need it? where you can use it? calculating costs and desiding how to execute query needs approximated count, but it's totally worthless information for any user IMO. Ismo On Thu, 22 Mar 2007, Albert Cervera Areny wrote: > As you can see, PostgreSQL nee

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread ismo . tuononen
explain is just "quessing" how many rows are in table. sometimes quess is right, sometimes just an estimate. sailabdb=# explain SELECT count(*) from sl_tuote; QUERY PLAN --

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Albert Cervera Areny
As you can see, PostgreSQL needs to do a sequencial scan to count because its MVCC nature and indices don't have transaction information. It's a known drawback inherent to the way PostgreSQL works and which gives very good results in other areas. It's been talked about adding some kind of appro

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Andreas Kostyrka
* Andreas Tille <[EMAIL PROTECTED]> [070322 12:07]: > Hi, > > I just try to find out why a simple count(*) might last that long. > At first I tried explain, which rather quickly knows how many rows > to check, but the final count is two orders of magnitude slower. Which version of PG? The basic

[PERFORM] Performance of count(*)

2007-03-22 Thread Andreas Tille
Hi, I just try to find out why a simple count(*) might last that long. At first I tried explain, which rather quickly knows how many rows to check, but the final count is two orders of magnitude slower. My MS_SQL server using colleague can't believe that. $ psql InfluenzaWeb -c 'explain SELECT

Re: [PERFORM] Performance of count(*) on large tables vs SQL Server

2005-02-01 Thread PFC
clause will be a cheap query - and use it to test if a table is empty, for instance. (because for Oracle/Sybase/SQL Server, count(*) is cheap). To test if a table is empty, use a SELECT EXISTS or whatever SELECT with a LIMIT 1... ---(end of broadcast)--

Re: [PERFORM] Performance of count(*) on large tables vs SQL Server

2005-02-01 Thread Stef
Hello Andrew, Everything that Shridhar says makes perfect sense, and, speaking from experience in dealing with this type of 'problem', everything you say does as well. Such is life really :) I would not be at -all- surprised if Sybase and Oracle did query re-writing behind the sc

Re: [PERFORM] Performance of count(*) on large tables vs SQL Server

2005-02-01 Thread Shridhar Daithankar
On Tuesday 01 Feb 2005 6:11 pm, Andrew Mayo wrote: > PG, on the other hand, appears to do a full table scan > to answer this question, taking nearly 4 seconds to > process the query. > > Doing an ANALYZE on the table and also VACUUM did not > seem to affect this. > > Can PG find a table's row count

[PERFORM] Performance of count(*) on large tables vs SQL Server

2005-02-01 Thread Andrew Mayo
Doing some rather crude comparative performance tests between PG 8.0.1 on Windows XP and SQL Server 2000, PG whips SQL Server's ass on insert into junk (select * from junk) on a one column table defined as int. If we start with a 1 row table and repeatedly execute this command, PG can take the ta