Re: [PERFORM] TB-sized databases

2007-11-29 Thread Russell Smith
Simon Riggs wrote: On Tue, 2007-11-27 at 18:06 -0500, Pablo Alcaraz wrote: Simon Riggs wrote: All of those responses have cooked up quite a few topics into one. Large databases might mean text warehouses, XML message stores, relational archives and fact-based business data warehouses.

Re: [PERFORM] Configuring a Large RAM PostgreSQL Server

2007-11-29 Thread Josh Berkus
Alex, > The new machine will have 48 GB of RAM, so figuring out starting   > points for the Shared Buffers and Work_mem/Maintenance_work_mem is   > going to be a crap shoot, since the defaults still seem to be based   > upon 256MB of RAM or less. Why a crap shoot? Set shared_buffers to 12GB. Se

Re: [PERFORM] TB-sized databases

2007-11-29 Thread Mark Kirkwood
Simon Riggs wrote: On Wed, 2007-11-28 at 14:48 +0100, Csaba Nagy wrote: In fact an even more useful option would be to ask the planner to throw error if the expected cost exceeds a certain threshold... Well, I've suggested it before: statement_cost_limit on pgsql-hackers, 1 March 2

Re: [PERFORM] clear pg_stats

2007-11-29 Thread Magnus Hagander
Campbell, Lance wrote: > How can I clear the pg_stats views without restarting PostgreSQL? I > thought there was a function. pg_stat_reset() //Magnus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archi

Re: [PERFORM] clear pg_stats

2007-11-29 Thread Heikki Linnakangas
Campbell, Lance wrote: How can I clear the pg_stats views without restarting PostgreSQL? I thought there was a function. SELECT pg_stat_reset(); -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5

[PERFORM] clear pg_stats

2007-11-29 Thread Campbell, Lance
How can I clear the pg_stats views without restarting PostgreSQL? I thought there was a function. Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu

Re: [PERFORM] GiST indexing tuples

2007-11-29 Thread Steinar H. Gunderson
On Thu, Nov 29, 2007 at 03:23:10PM -0500, Matthew T. O'Connor wrote: > Sorry in advance if this is a stupid question, but how is this better than > two index, one on "a" and one on "b"? I supposed there could be a space > savings but beyond that? You could index on both columns simultaneously w

Re: [PERFORM] GiST indexing tuples

2007-11-29 Thread Matthew T. O'Connor
Matthew wrote: For instance, the normal B-tree index on (a, b) is able to answer queries like "a = 5 AND b > 1" or "a > 5". An R-tree would be able to index these, plus queries like "a > 5 AND b < 1". Sorry in advance if this is a stupid question, but how is this better than two index, one on

Re: [PERFORM] Configuring a Large RAM PostgreSQL Server

2007-11-29 Thread Alvaro Herrera
> On Nov 29, 2007, at 2:15 PM, Richard Huxton wrote: >> Alex Hochberger wrote: >>>Problem Usage: we have a 20GB table with 120m rows that we are >>> splitting into some sub-tables. Generally, we do large data pulls from >>> here, 1 million - 4 million records at a time, stored in a new tabl

Re: [PERFORM] Configuring a Large RAM PostgreSQL Server

2007-11-29 Thread Alex Hochberger
It's not on rebuilding the index, it's on CREATE INDEX. I attribute it to wrong setting, Ubuntu bizarre-ness, and general problems. We need new hardware, the servers are running on aging infrastructure, and we decided to get a new system that will last us the next 3-4 years all at once.

Re: [PERFORM] Configuring a Large RAM PostgreSQL Server

2007-11-29 Thread Richard Huxton
Alex Hochberger wrote: Does anyone have any white papers or basic guides for a large RAM server? We are consolidating two databases to enable better data-mining that currently run on a 4 GB and 2 GB machine. The data issues on the 4 GB machine are numerous, things like "create index" fail and

[PERFORM] Configuring a Large RAM PostgreSQL Server

2007-11-29 Thread Alex Hochberger
Does anyone have any white papers or basic guides for a large RAM server? We are consolidating two databases to enable better data-mining that currently run on a 4 GB and 2 GB machine. The data issues on the 4 GB machine are numerous, things like "create index" fail and update queries fa

Re: [PERFORM] TB-sized databases

2007-11-29 Thread Csaba Nagy
On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote: > Given that this list spends all day every day discussing cases where the > planner is wrong, I'd have to think that that's a bet I wouldn't take. > > You could probably avoid this risk by setting the cutoff at something > like 100 or 1000 times

Re: [PERFORM] TB-sized databases

2007-11-29 Thread Stephen Frost
* Simon Riggs ([EMAIL PROTECTED]) wrote: > On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote: > > Given that this list spends all day every day discussing cases where the > > planner is wrong, I'd have to think that that's a bet I wouldn't take. > > I think you have a point, but the alternative is

Re: [PERFORM] 7.4 Checkpoint Question

2007-11-29 Thread Brad Nicholson
On Thu, 2007-11-29 at 16:14 +, Simon Riggs wrote: > On Thu, 2007-11-29 at 10:10 -0500, Brad Nicholson wrote: > > I have a legacy system still on 7.4 (I know, I know...the upgrade is > > coming soon). > > > > I have a fairly big spike happening once a day, every day, at the same > > time. It

Re: [PERFORM] TB-sized databases

2007-11-29 Thread Simon Riggs
On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote: > Gregory Stark <[EMAIL PROTECTED]> writes: > > "Simon Riggs" <[EMAIL PROTECTED]> writes: > >> Tom's previous concerns were along the lines of "How would know what to > >> set it to?", given that the planner costs are mostly arbitrary numbers. > >

Re: [PERFORM] 7.4 Checkpoint Question

2007-11-29 Thread Simon Riggs
On Thu, 2007-11-29 at 10:10 -0500, Brad Nicholson wrote: > I have a legacy system still on 7.4 (I know, I know...the upgrade is > coming soon). > > I have a fairly big spike happening once a day, every day, at the same > time. It happens during a checkpoint, no surprise there. I know the > solut

Re: [PERFORM] 7.4 Checkpoint Question

2007-11-29 Thread Andrew Sullivan
On Thu, Nov 29, 2007 at 10:10:54AM -0500, Brad Nicholson wrote: > This is a _really _low volume system, less than 500 writes/hour. Normal > operation sees checkpoint related spikes of around 200-300 milliseconds. > We always checkpoint at the checkpoint timeout (every 5 minutes). > During this one

Re: [PERFORM] TB-sized databases

2007-11-29 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes: > "Simon Riggs" <[EMAIL PROTECTED]> writes: >> Tom's previous concerns were along the lines of "How would know what to >> set it to?", given that the planner costs are mostly arbitrary numbers. > Hm, that's only kind of true. The units are not the problem

Re: [PERFORM] Query only slow on first run

2007-11-29 Thread Tom Lane
cluster <[EMAIL PROTECTED]> writes: >> You're essentially asking for a random sample of data that is not >> currently in memory. You're not going to get that without some I/O. > No, that sounds reasonable enough. But do you agree with the statement > that my query will just get slower and slower

[PERFORM] 7.4 Checkpoint Question

2007-11-29 Thread Brad Nicholson
I have a legacy system still on 7.4 (I know, I know...the upgrade is coming soon). I have a fairly big spike happening once a day, every day, at the same time. It happens during a checkpoint, no surprise there. I know the solution to the problem (upgrade to a modern version), but what I'm lookin

Re: [PERFORM] TB-sized databases

2007-11-29 Thread Gregory Stark
"Simon Riggs" <[EMAIL PROTECTED]> writes: > On Wed, 2007-11-28 at 14:48 +0100, Csaba Nagy wrote: > >> In fact an even more useful option would be to ask the planner to throw >> error if the expected cost exceeds a certain threshold... > > Well, I've suggested it before: > > statement_cost_limit o

Re: [PERFORM] Query only slow on first run

2007-11-29 Thread cluster
You're essentially asking for a random sample of data that is not currently in memory. You're not going to get that without some I/O. No, that sounds reasonable enough. But do you agree with the statement that my query will just get slower and slower over time as the number of posts increases