Re: [PERFORM] splitting data into multiple tables

2010-01-26 Thread Matthew Wakeling
On Mon, 25 Jan 2010, Viji V Nair wrote: I think this wont help that much if you have a single machine. Partition the table and keep the data in different nodes. Have a look at the tools like pgpool.II So partitioning. You have three choices: 1. Use a single table 2. Partition the table on the

Re: [PERFORM] splitting data into multiple tables

2010-01-26 Thread Matthew Wakeling
On Mon, 25 Jan 2010, nair rajiv wrote: I am working on a project that will take out structured content from wikipedia and put it in our database... there is a table which will approximately have 5 crore entries after data harvesting. Have you asked the Wikimedia Foundation if they mind you

Re: [PERFORM] splitting data into multiple tables

2010-01-26 Thread nair rajiv
On Tue, Jan 26, 2010 at 5:15 PM, Matthew Wakeling matt...@flymine.orgwrote: On Mon, 25 Jan 2010, nair rajiv wrote: I am working on a project that will take out structured content from wikipedia and put it in our database... there is a table which will approximately have 5 crore entries

[PERFORM] Poor query plan across OR operator

2010-01-26 Thread Mark Hills
One of our most-used queries performs poorly (taking over 2 seconds) and a tiny amount of refactoring shows it can be fast (less than 1ms) by transforming the OR case (which spans two tables) into a UNION. I have created a simple test case (below) which shows the difference we are seeing in

Re: [PERFORM] Poor query plan across OR operator

2010-01-26 Thread Grzegorz Jaśkiewicz
just create index on both columns: CREATE INDEX foo_i ON foo(bar1, bar2); HTH -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Poor query plan across OR operator

2010-01-26 Thread Tom Lane
Mark Hills mark.hi...@framestore.com writes: One of our most-used queries performs poorly (taking over 2 seconds) and a tiny amount of refactoring shows it can be fast (less than 1ms) by transforming the OR case (which spans two tables) into a UNION. I'd suggest going with the UNION. We are

[PERFORM] Should the optimiser convert a CASE into a WHERE if it can?

2010-01-26 Thread Richard Neill
Dear All, Just wondering whether there is a missing scope for the query planner (on 8.4.2) to be cleverer than it currently is. Specifically, I wonder whether the optimiser should know that by converting a CASE condition into a WHERE condition, it can use an index. Have I found a possible

Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?

2010-01-26 Thread Tom Lane
Richard Neill rn...@cam.ac.uk writes: SELECT SUM (case when id 120 and id 121 then 1 else 0 end) AS c1, SUM (case when id 121 and id 122 then 1 else 0 end) AS c2, ... FROM tbl_tracker; This can be manually optimised into a far uglier (but much much faster)

Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?

2010-01-26 Thread Matthew Wakeling
On Tue, 26 Jan 2010, Richard Neill wrote: SELECT SUM (case when id 120 and id 121 then 1 else 0 end) from tbl_tracker; Explain shows that this does a sequential scan. I'd defer to Tom on this one, but really, for Postgres to work this out, it would have to peer deep into the

Re: [PERFORM] splitting data into multiple tables

2010-01-26 Thread Greg Smith
Viji V Nair wrote: A 15k rpm SAS drive will give you a throughput of 12MB and 120 IOPS. Now you can calculate the number of disks, specifically spindles, for getting your desired throughput and IOPs I think you mean 120MB/s for that first part. Regardless, presuming you can provision a

Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?

2010-01-26 Thread Richard Neill
Thanks for your answers. David Wilson wrote: Why not simply add the where clause to the original query? SELECT SUM (case when id 120 and id 121 then 1 else 0 end) AS c1, SUM (case when id 121 and id 122 then 1 else 0 end) AS c2, SUM (case when id 122 and id

Re: [PERFORM] splitting data into multiple tables

2010-01-26 Thread Viji V Nair
On Tue, Jan 26, 2010 at 11:11 PM, Greg Smith g...@2ndquadrant.com wrote: Viji V Nair wrote: A 15k rpm SAS drive will give you a throughput of 12MB and 120 IOPS. Now you can calculate the number of disks, specifically spindles, for getting your desired throughput and IOPs I think you mean

Re: [PERFORM] Poor query plan across OR operator

2010-01-26 Thread Robert Haas
On Tue, Jan 26, 2010 at 11:41 AM, Tom Lane t...@sss.pgh.pa.us wrote: Mark Hills mark.hi...@framestore.com writes: One of our most-used queries performs poorly (taking over 2 seconds) and a tiny amount of refactoring shows it can be fast (less than 1ms) by transforming the OR case (which spans

Re: [PERFORM] splitting data into multiple tables

2010-01-26 Thread Greg Smith
Viji V Nair wrote: There are catches in the SAN controllers also. SAN vendors wont give that much information regarding their internal controller design. They will say they have 4 external 4G ports, you should also check how many internal ports they have and the how the controllers are

Re: [PERFORM] Poor query plan across OR operator

2010-01-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Tue, Jan 26, 2010 at 11:41 AM, Tom Lane t...@sss.pgh.pa.us wrote: I'd suggest going with the UNION.  We are unlikely to make the planner look for such cases, because usually such a transformation would be a net loss.  It seems like rather a corner

Re: [PERFORM] Poor query plan across OR operator

2010-01-26 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote: Actually, in the type of case Mark is showing, the estimates might be *more* accurate since the condition gets decomposed into separate per-table conditions. I'm still dubious about how often it's a win though. There's another problem, which is that

Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-26 Thread Scott Carey
On Jan 25, 2010, at 6:55 AM, fka...@googlemail.com wrote: Scott Carey: (2) The tests: Note: The standard speed was about 800MB/40s, so 20MB/s. a) What I changed: fsync=off Result: 35s, so 5s faster. b) like a) but: checkpoint_segments=128 (was 3) autovacuum=off Result:

Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?

2010-01-26 Thread Scott Carey
On Jan 26, 2010, at 9:41 AM, Richard Neill wrote: Thanks for your answers. David Wilson wrote: Why not simply add the where clause to the original query? SELECT SUM (case when id 120 and id 121 then 1 else 0 end) AS c1, SUM (case when id 121 and id 122 then 1

Re: [PERFORM] New server to improve performance on our large and busy DB - advice?

2010-01-26 Thread Greg Smith
Carlo Stonebanks wrote: effective_cache_size should be set to how much memory is leftover for disk caching after taking into account what's used by the operating system, dedicated PostgreSQL memory, and other applications. If it's set too low, indexes may not be used for executing queries

Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-26 Thread Greg Smith
Scott Carey wrote: #bgwriter_delay=200ms #bgwriter_lru_percent=1.0 #bgwriter_lru_maxpages=5 #bgwriter_all_percent=0.333 #bgwriter_all_maxpages=5 #checkpoint_segments=3 #checkpoint_timeout=5min #checkpoint_warning=30s Check out this for info on these parameters

[PERFORM] Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2010-01-26 Thread Greg Smith
Greg Stark wrote: Actually before we get there could someone who demonstrated the speedup verify that this patch still gets that same speedup? Let's step back a second and get to the bottom of why some people are seeing this and others aren't. The original report here suggested this was