Dan Armbrust wrote:
>
> All of my testing to date has been done with synchronous_commit=off
>
> I just tried setting full_page_writes=off - and like magic, the entire
> hiccup went away.
>
Why is the full_page_write happening before the commit returns when
synchronous_commit is set to off? I
On Mon, Jul 13, 2009 at 3:53 PM, Dan
Armbrust wrote:
>> So this thought leads to a couple of other things Dan could test.
>> First, see if turning off full_page_writes makes the hiccup go away.
>> If so, we know the problem is in this area (though still not exactly
>> which reason); if not we need
>
> Propose a DTrace probe immediately after the "goto begin" at line 740 of
> xlog.c, so we can start tracing from the first backend following
> checkpoint, and turn off tracing when all backends have completed a
> transaction.
>
That's greek to me. But I'm happy to test things if you send me
pa
On Mon, 2009-07-13 at 15:53 -0500, Dan Armbrust wrote:
> > So this thought leads to a couple of other things Dan could test.
> > First, see if turning off full_page_writes makes the hiccup go away.
> > If so, we know the problem is in this area (though still not exactly
> > which reason); if not w
> So this thought leads to a couple of other things Dan could test.
> First, see if turning off full_page_writes makes the hiccup go away.
> If so, we know the problem is in this area (though still not exactly
> which reason); if not we need another idea. That's not a good permanent
> fix though,
On Sun, 2009-07-12 at 13:10 -0400, Tom Lane wrote:
> It's hard to see how it could have continuing effects over several
> seconds, especially in a system that has CPU to spare.
Any queueing situation takes a while to resolve and over-damped systems
can take a long time to resolve themselves. We
Simon Riggs writes:
> This causes us to queue for the WALInsertLock twice at exactly the time
> when every caller needs to calculate the CRC for complete blocks. So we
> queue twice when the lock-hold-time is consistently high, causing queue
> lengths to go ballistic.
You keep saying that, and it
On Fri, 2009-07-10 at 14:25 -0500, Dan Armbrust wrote:
> > Hm, I'm not sure I believe any of that except the last bit, seeing that
> > he's got plenty of excess CPU capability. But the last bit fits with
> > the wimpy-I/O problem, and it also offers something we could test.
> > Dan, please see wh
> Hm, I'm not sure I believe any of that except the last bit, seeing that
> he's got plenty of excess CPU capability. But the last bit fits with
> the wimpy-I/O problem, and it also offers something we could test.
> Dan, please see what happens when you vary the wal_buffers setting.
> (Note you ne
Simon Riggs writes:
> I think its a traffic jam.
> After checkpoint in XLogInsert(), we discover that we now have to backup
> a block that we didn't think so previously. So we have to drop the lock
> and then re-access WALInsertLock. So every backend has to go through the
> queue twice the first
On Fri, 2009-07-10 at 10:27 -0400, Tom Lane wrote:
> Simon Riggs writes:
> > ISTM more likely to be a problem with checkpointing clog or subtrans.
> > That would block everybody and the scale of the problem is about right.
>
> That's what I had been thinking too, but the log_checkpoint output
>
Simon Riggs writes:
> ISTM more likely to be a problem with checkpointing clog or subtrans.
> That would block everybody and the scale of the problem is about right.
That's what I had been thinking too, but the log_checkpoint output
conclusively disproves it: those steps are taking less than 20ms
On Wed, 2009-07-08 at 18:22 -0400, Tom Lane wrote:
> As Greg commented upthread, we seem to be getting forced to the
> conclusion that the initial buffer scan in BufferSync() is somehow
> causing this. There are a couple of things it'd be useful to try
> here:
Not sure why you're forced to that
> As Greg commented upthread, we seem to be getting forced to the
> conclusion that the initial buffer scan in BufferSync() is somehow
> causing this. There are a couple of things it'd be useful to try
> here:
>
> * see how the size of the hiccup varies with shared_buffers;
I tried decreasing sha
Dan Armbrust writes:
> Almost all of the slow query log messages are logged within about 3
> seconds of the checkpoint starting message.
> LOG: checkpoint complete: wrote 9975 buffers (77.9%); 0 transaction
> log file(s) added, 0 removed, 15 recycled; write=156.576 s, sync=0.065
> s, total=156.6
> However, the latest report says that he
> managed that, and yet there's still a one-or-two-second transient of
> some sort. I'm wondering what's causing that. If it were at the *end*
> of the checkpoint, it might be the disk again (failing to handle a bunch
> of fsyncs, perhaps). But if it rea
On Wed, 8 Jul 2009, Tom Lane wrote:
He's only got 100MB of shared buffers, which doesn't seem like much
considering it's apparently a fairly beefy system. I definitely
don't see how one CPU spinning over the buffer headers in BufferSync
is going to create the sort of hiccup he's describing.
A
John R Pierce writes:
> a beefy system with...
>> Harddrive is just a simple, run-of-the-mill desktop drive.
> which is going to severely limit random write throughput
True, which is why he's having to flail so hard to keep the checkpoint
from saturating his I/O. However, the latest report s
Tom Lane wrote:
He's only got 100MB of shared buffers, which doesn't seem like much
considering it's apparently a fairly beefy system.
a beefy system with...
Harddrive is just a simple, run-of-the-mill desktop drive.
which is going to severely limit random write throughput
--
Se
Greg Smith writes:
On Wed, 8 Jul 2009, Dan Armbrust wrote:
>> What I observe now is that I get a short (1-2 second) period where I
>> get slow queries - I'm running about 30 queries in parallel at any
>> given time - it appears that all 30 queries get paused for a couple of
>> seconds at the momen
On Wed, 8 Jul 2009, Dan Armbrust wrote:
My takeaway is that starting the checkpoint process is really
expensive - so I don't want to start it very frequently. And the only
downside to longer intervals between checkpoints is a longer recovery
time if the system crashes?
And additional disk spa
On Wed, 8 Jul 2009, Dan Armbrust wrote:
With checkpoint_segments set to 10, the checkpoints appear to be
happening due to checkpoint_timeout - which I've left at the default
of 5 minutes.
OK, then that's as far upwards as you probably need to tweak that for your
workload, even though most sys
>> Wouldn't increasing the length between checkpoints result in the
>> checkpoint process taking even longer to complete?
>
> You don't really care how long it takes. What you want is for it not to
> be chewing a bigger fraction of your I/O bandwidth than you can spare.
> Hence, you want it to tak
Dan Armbrust writes:
> On Wed, Jul 8, 2009 at 1:23 PM, Tom Lane wrote:
>> Well, you could increase both those settings so as to put the
>> checkpoints further apart, and/or increase checkpoint_completion_target
>> to spread the checkpoint I/O over a larger fraction of the cycle.
> Wouldn't increa
On Wed, Jul 8, 2009 at 1:23 PM, Tom Lane wrote:
> Dan Armbrust writes:
>> With checkpoint_segments set to 10, the checkpoints appear to be
>> happening due to checkpoint_timeout - which I've left at the default
>> of 5 minutes.
>
> Well, you could increase both those settings so as to put the
> ch
On Wed, Jul 8, 2009 at 12:50 PM, Tom Lane wrote:
> Dan Armbrust writes:
>> However, once the checkpoint process begins, I get a whole flood of
>> queries that take between 1 and 10 seconds to complete. My throughput
>> crashes to near nothing. The checkpoint takes between 45 seconds and
>> a min
Dan Armbrust writes:
> However, once the checkpoint process begins, I get a whole flood of
> queries that take between 1 and 10 seconds to complete. My throughput
> crashes to near nothing. The checkpoint takes between 45 seconds and
> a minute to complete.
You sure this is 8.3? It should spre
27 matches
Mail list logo