Re: [GENERAL] Checkpoint Tuning Question

2009-07-22 Thread tomrevam
Dan Armbrust wrote: > > All of my testing to date has been done with synchronous_commit=off > > I just tried setting full_page_writes=off - and like magic, the entire > hiccup went away. > Why is the full_page_write happening before the commit returns when synchronous_commit is set to off? I

Re: [GENERAL] Checkpoint Tuning Question

2009-07-20 Thread Dan Armbrust
On Mon, Jul 13, 2009 at 3:53 PM, Dan Armbrust wrote: >> So this thought leads to a couple of other things Dan could test. >> First, see if turning off full_page_writes makes the hiccup go away. >> If so, we know the problem is in this area (though still not exactly >> which reason); if not we need

Re: [GENERAL] Checkpoint Tuning Question

2009-07-14 Thread Dan Armbrust
> > Propose a DTrace probe immediately after the "goto begin" at line 740 of > xlog.c, so we can start tracing from the first backend following > checkpoint, and turn off tracing when all backends have completed a > transaction. > That's greek to me. But I'm happy to test things if you send me pa

Re: [GENERAL] Checkpoint Tuning Question

2009-07-14 Thread Simon Riggs
On Mon, 2009-07-13 at 15:53 -0500, Dan Armbrust wrote: > > So this thought leads to a couple of other things Dan could test. > > First, see if turning off full_page_writes makes the hiccup go away. > > If so, we know the problem is in this area (though still not exactly > > which reason); if not w

Re: [GENERAL] Checkpoint Tuning Question

2009-07-13 Thread Dan Armbrust
> So this thought leads to a couple of other things Dan could test. > First, see if turning off full_page_writes makes the hiccup go away. > If so, we know the problem is in this area (though still not exactly > which reason); if not we need another idea.  That's not a good permanent > fix though,

Re: [GENERAL] Checkpoint Tuning Question

2009-07-12 Thread Simon Riggs
On Sun, 2009-07-12 at 13:10 -0400, Tom Lane wrote: > It's hard to see how it could have continuing effects over several > seconds, especially in a system that has CPU to spare. Any queueing situation takes a while to resolve and over-damped systems can take a long time to resolve themselves. We

Re: [GENERAL] Checkpoint Tuning Question

2009-07-12 Thread Tom Lane
Simon Riggs writes: > This causes us to queue for the WALInsertLock twice at exactly the time > when every caller needs to calculate the CRC for complete blocks. So we > queue twice when the lock-hold-time is consistently high, causing queue > lengths to go ballistic. You keep saying that, and it

Re: [GENERAL] Checkpoint Tuning Question

2009-07-12 Thread Simon Riggs
On Fri, 2009-07-10 at 14:25 -0500, Dan Armbrust wrote: > > Hm, I'm not sure I believe any of that except the last bit, seeing that > > he's got plenty of excess CPU capability. But the last bit fits with > > the wimpy-I/O problem, and it also offers something we could test. > > Dan, please see wh

Re: [GENERAL] Checkpoint Tuning Question

2009-07-10 Thread Dan Armbrust
> Hm, I'm not sure I believe any of that except the last bit, seeing that > he's got plenty of excess CPU capability.  But the last bit fits with > the wimpy-I/O problem, and it also offers something we could test. > Dan, please see what happens when you vary the wal_buffers setting. > (Note you ne

Re: [GENERAL] Checkpoint Tuning Question

2009-07-10 Thread Tom Lane
Simon Riggs writes: > I think its a traffic jam. > After checkpoint in XLogInsert(), we discover that we now have to backup > a block that we didn't think so previously. So we have to drop the lock > and then re-access WALInsertLock. So every backend has to go through the > queue twice the first

Re: [GENERAL] Checkpoint Tuning Question

2009-07-10 Thread Simon Riggs
On Fri, 2009-07-10 at 10:27 -0400, Tom Lane wrote: > Simon Riggs writes: > > ISTM more likely to be a problem with checkpointing clog or subtrans. > > That would block everybody and the scale of the problem is about right. > > That's what I had been thinking too, but the log_checkpoint output >

Re: [GENERAL] Checkpoint Tuning Question

2009-07-10 Thread Tom Lane
Simon Riggs writes: > ISTM more likely to be a problem with checkpointing clog or subtrans. > That would block everybody and the scale of the problem is about right. That's what I had been thinking too, but the log_checkpoint output conclusively disproves it: those steps are taking less than 20ms

Re: [GENERAL] Checkpoint Tuning Question

2009-07-10 Thread Simon Riggs
On Wed, 2009-07-08 at 18:22 -0400, Tom Lane wrote: > As Greg commented upthread, we seem to be getting forced to the > conclusion that the initial buffer scan in BufferSync() is somehow > causing this. There are a couple of things it'd be useful to try > here: Not sure why you're forced to that

Re: [GENERAL] Checkpoint Tuning Question

2009-07-09 Thread Dan Armbrust
> As Greg commented upthread, we seem to be getting forced to the > conclusion that the initial buffer scan in BufferSync() is somehow > causing this.  There are a couple of things it'd be useful to try > here: > > * see how the size of the hiccup varies with shared_buffers; I tried decreasing sha

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread Tom Lane
Dan Armbrust writes: > Almost all of the slow query log messages are logged within about 3 > seconds of the checkpoint starting message. > LOG: checkpoint complete: wrote 9975 buffers (77.9%); 0 transaction > log file(s) added, 0 removed, 15 recycled; write=156.576 s, sync=0.065 > s, total=156.6

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread Dan Armbrust
> However, the latest report says that he > managed that, and yet there's still a one-or-two-second transient of > some sort.  I'm wondering what's causing that.  If it were at the *end* > of the checkpoint, it might be the disk again (failing to handle a bunch > of fsyncs, perhaps).  But if it rea

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread Greg Smith
On Wed, 8 Jul 2009, Tom Lane wrote: He's only got 100MB of shared buffers, which doesn't seem like much considering it's apparently a fairly beefy system. I definitely don't see how one CPU spinning over the buffer headers in BufferSync is going to create the sort of hiccup he's describing. A

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread Tom Lane
John R Pierce writes: > a beefy system with... >> Harddrive is just a simple, run-of-the-mill desktop drive. > which is going to severely limit random write throughput True, which is why he's having to flail so hard to keep the checkpoint from saturating his I/O. However, the latest report s

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread John R Pierce
Tom Lane wrote: He's only got 100MB of shared buffers, which doesn't seem like much considering it's apparently a fairly beefy system. a beefy system with... Harddrive is just a simple, run-of-the-mill desktop drive. which is going to severely limit random write throughput -- Se

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread Tom Lane
Greg Smith writes: On Wed, 8 Jul 2009, Dan Armbrust wrote: >> What I observe now is that I get a short (1-2 second) period where I >> get slow queries - I'm running about 30 queries in parallel at any >> given time - it appears that all 30 queries get paused for a couple of >> seconds at the momen

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread Greg Smith
On Wed, 8 Jul 2009, Dan Armbrust wrote: My takeaway is that starting the checkpoint process is really expensive - so I don't want to start it very frequently. And the only downside to longer intervals between checkpoints is a longer recovery time if the system crashes? And additional disk spa

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread Greg Smith
On Wed, 8 Jul 2009, Dan Armbrust wrote: With checkpoint_segments set to 10, the checkpoints appear to be happening due to checkpoint_timeout - which I've left at the default of 5 minutes. OK, then that's as far upwards as you probably need to tweak that for your workload, even though most sys

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread Dan Armbrust
>> Wouldn't increasing the length between checkpoints result in the >> checkpoint process taking even longer to complete? > > You don't really care how long it takes.  What you want is for it not to > be chewing a bigger fraction of your I/O bandwidth than you can spare. > Hence, you want it to tak

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread Tom Lane
Dan Armbrust writes: > On Wed, Jul 8, 2009 at 1:23 PM, Tom Lane wrote: >> Well, you could increase both those settings so as to put the >> checkpoints further apart, and/or increase checkpoint_completion_target >> to spread the checkpoint I/O over a larger fraction of the cycle. > Wouldn't increa

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread Dan Armbrust
On Wed, Jul 8, 2009 at 1:23 PM, Tom Lane wrote: > Dan Armbrust writes: >> With checkpoint_segments set to 10, the checkpoints appear to be >> happening due to checkpoint_timeout - which I've left at the default >> of 5 minutes. > > Well, you could increase both those settings so as to put the > ch

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread Dan Armbrust
On Wed, Jul 8, 2009 at 12:50 PM, Tom Lane wrote: > Dan Armbrust writes: >> However, once the checkpoint process begins, I get a whole flood of >> queries that take between 1 and 10 seconds to complete.  My throughput >> crashes to near nothing.  The checkpoint takes between 45 seconds and >> a min

Re: [GENERAL] Checkpoint Tuning Question

2009-07-08 Thread Tom Lane
Dan Armbrust writes: > However, once the checkpoint process begins, I get a whole flood of > queries that take between 1 and 10 seconds to complete. My throughput > crashes to near nothing. The checkpoint takes between 45 seconds and > a minute to complete. You sure this is 8.3? It should spre