Gregory Stark wrote:
> I can imagine a scenario where you have a system that's very busy for 60s and
> then idle for 60s repeatedly. And for some reason you configure a
> checkpoint_timeout on the order of 20m or so (assuming you're travelling
> precisely 60mph).
Is that Scottish m?
--
Alvaro
"Greg Smith" <[EMAIL PROTECTED]> writes:
> If you write them twice, so what? You didn't even get to that point as an
> option until all the important stuff was taken care of and the system was
> near idle.
Well even if it's near idle you were still occupying the i/o system for a few
milliseconds.
On Tue, 26 Jun 2007, Tom Lane wrote:
I'm not impressed with the idea of writing buffers because we might need
them someday; that just costs extra I/O due to re-dirtying in too many
scenarios.
This is kind of an interesting statement to me because it really
highlights the difference in how I
On Tue, 26 Jun 2007, Tom Lane wrote:
I have no doubt that there are scenarios such as you are thinking about,
but it definitely seems like a corner case that doesn't justify keeping
the all-buffers scan. That scan is costing us extra I/O in ordinary
non-corner cases, so it's not free to keep it
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> To recap, the sequence is:
> 1. COPY FROM
> 2. checkpoint
> 3. VACUUM
> Now you have buffer cache full of dirty buffers with usage_count=1,
Well, it won't be very full, because VACUUM works in a limited number of
buffers (and did even before the B
On Mon, 25 Jun 2007, Tom Lane wrote:
right now, BgBufferSync starts over from the current clock-sweep point
on each call --- that is, each bgwriter cycle. So it can't really be
made to write very many buffers without excessive CPU work. Maybe we
should redefine it to have some static state c
Tom Lane wrote:
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
Tom Lane wrote:
Who's "we"? AFAICS, CVS HEAD will treat a large copy the same as any
other large heapscan.
Umm, I'm talking about populating a table with COPY *FROM*. That's not a
heap scan at all.
No wonder we're failing to c
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Who's "we"? AFAICS, CVS HEAD will treat a large copy the same as any
>> other large heapscan.
> Umm, I'm talking about populating a table with COPY *FROM*. That's not a
> heap scan at all.
No wonder we're failing to communicate
Tom Lane wrote:
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
Tom Lane wrote:
(Note that COPY per se will not trigger this behavior anyway, since it
will act in a limited number of buffers because of the recent buffer
access strategy patch.)
Actually we dropped it from COPY, because it didn'
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> (Note that COPY per se will not trigger this behavior anyway, since it
>> will act in a limited number of buffers because of the recent buffer
>> access strategy patch.)
> Actually we dropped it from COPY, because it didn't seem t
Tom Lane wrote:
(Note that COPY per se will not trigger this behavior anyway, since it
will act in a limited number of buffers because of the recent buffer
access strategy patch.)
Actually we dropped it from COPY, because it didn't seem to improve
performance in the tests we ran.
--
Heikki
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> This argument supposes that the bgwriter will do nothing while the COPY
>> is proceeding.
> It will clean buffers ahead of the COPY, but it won't write the buffers
> COPY leaves behind since they have usage_count=1.
Yeah, and th
Tom Lane wrote:
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
One pathological case is a COPY of a table slightly smaller than
shared_buffers. That will fill the buffer cache. If you then have a
checkpoint, and after that a SELECT COUNT(*), or a VACUUM, the buffer
cache will be full of pages
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> ... that's what the LRU scan is for.
> Yeah, except the LRU scan is not doing a very good job at that. It will
> ignore buffers with usage_count > 0, and it only scans
> bgwriter_lru_percent buffers ahead of the clock hand.
Whi
Tom Lane wrote:
Anyway, if there are no XLOG records since the last checkpoint, there's
probably nothing in shared buffers that needs flushing. There might be
some dirty hint-bits, but the only reason to push those out is to make
some free buffers available, and doing that is not checkpoint's jo
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Hmm. But if we're going to do that, we might as well have a checkpoint
>> for our troubles, no? The reason for the current design is the
>> assumption that a bgwriter_all scan is less burdensome than a
>> checkpoint, but that is
Tom Lane wrote:
Greg Smith <[EMAIL PROTECTED]> writes:
The way transitions between completely idle and all-out bursts happen were
one problematic area I struggled with. Since the LRU point doesn't move
during the idle parts, and the lingering buffers have a usage_count>0, the
LRU scan won't t
Tom Lane wrote:
Hmm. But if we're going to do that, we might as well have a checkpoint
for our troubles, no? The reason for the current design is the
assumption that a bgwriter_all scan is less burdensome than a
checkpoint, but that is no longer true given this rewrite.
Per comments in Create
Greg Smith <[EMAIL PROTECTED]> writes:
> The way transitions between completely idle and all-out bursts happen were
> one problematic area I struggled with. Since the LRU point doesn't move
> during the idle parts, and the lingering buffers have a usage_count>0, the
> LRU scan won't touch them;
On Mon, 25 Jun 2007, Heikki Linnakangas wrote:
Please describe the class of transactions and the service guarantees so
that we can reproduce that, and figure out what's the best solution.
I'm confident you're already moving in that direction by noticing how the
90th percentile numbers were ki
On Mon, 25 Jun 2007, Heikki Linnakangas wrote:
It only scans bgwriter_lru_percent buffers ahead of the clock hand. If the
hand isn't moving, it keeps scanning the same buffers over and over again.
You can crank it all the way up to 100%, though, in which case it would work,
but that starts to
On Mon, 25 Jun 2007, Heikki Linnakangas wrote:
Greg, is this the kind of workload you're having, or is there some other
scenario you're worried about?
The way transitions between completely idle and all-out bursts happen were
one problematic area I struggled with. Since the LRU point doesn't
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Heikki Linnakangas <[EMAIL PROTECTED]> writes:
>>> If you have a system with a very bursty transaction rate, it's possible
>>> that when it's time for a checkpoint, there hasn't been any WAL logged
>>> activity since last checkpo
Tom Lane wrote:
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
On further thought, there is one workload where removing the non-LRU
part would be counterproductive:
If you have a system with a very bursty transaction rate, it's possible
that when it's time for a checkpoint, there hasn't been
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> On further thought, there is one workload where removing the non-LRU
> part would be counterproductive:
> If you have a system with a very bursty transaction rate, it's possible
> that when it's time for a checkpoint, there hasn't been any WAL log
Tom Lane wrote:
I agree with removing the non-LRU
part of the bgwriter's write logic though; that should simplify matters
a bit and cut down the overall I/O volume.
On further thought, there is one workload where removing the non-LRU
part would be counterproductive:
If you have a system with
On Mon, 2007-06-25 at 12:56 +0200, Magnus Hagander wrote:
> Didn't we already add other featuers that makes recovery much *faster* than
> before? In that case, are they faster enugh to neutralise this increased
> time (a guestimate, of course)
>
> Or did I mess that up with stuff we added for 8.2
Magnus Hagander wrote:
On Mon, Jun 25, 2007 at 10:15:07AM +0100, Simon Riggs wrote:
As you say, we can put comments in the release notes to advise people of
50% increase in recovery time if the parameters stay the same. That
would be balanced by the comment that checkpoints are now considerably
On Mon, Jun 25, 2007 at 10:15:07AM +0100, Simon Riggs wrote:
> On Mon, 2007-06-25 at 01:33 -0400, Greg Smith wrote:
> > On Sun, 24 Jun 2007, Simon Riggs wrote:
> >
> > > Greg can't choose to use checkpoint_segments as the limit and then
> > > complain about unbounded recovery time, because that w
On Mon, 2007-06-25 at 01:33 -0400, Greg Smith wrote:
> On Sun, 24 Jun 2007, Simon Riggs wrote:
>
> > Greg can't choose to use checkpoint_segments as the limit and then
> > complain about unbounded recovery time, because that was clearly a
> > conscious choice.
>
> I'm complaining
I apologise
Greg Smith wrote:
LDC certainly makes things better in almost every case. My "allegiance"
comes from having seen a class of transactions where LDC made things
worse on a fast/overloaded system, in that it made some types of service
guarantees harder to meet, and I just don't know who else migh
On Mon, 25 Jun 2007, Tom Lane wrote:
I'm not sure why you hold such strong allegiance to the status quo. We
know that the status quo isn't working very well.
Don't get me wrong here; I am a big fan of this patch, think it's an
important step forward, and it's exactly the fact that I'm so she
On Sun, 24 Jun 2007, Simon Riggs wrote:
Greg can't choose to use checkpoint_segments as the limit and then
complain about unbounded recovery time, because that was clearly a
conscious choice.
I'm complaining only because everyone seems content to wander in a
direction where the multiplier on
Greg Smith <[EMAIL PROTECTED]> writes:
> I am not a fan of introducing a replacement feature based on what I
> consider too limited testing, and I don't feel this one has been beat on
> long yet enough to start pruning features that would allow better backward
> compatibility/transitioning. I t
On Sun, 24 Jun 2007, Simon Riggs wrote:
I can't see why anyone would want to turn off smoothing: If they are
doing many writes, then they will be effected by the sharp dive at
checkpoint, which happens *every* checkpoint.
There are service-level agreement situations where a short and sharp
di
On Fri, 2007-06-22 at 16:57 -0400, Greg Smith wrote:
> If you're not, I think you should be. Keeping that replay interval
> time down was one of the reasons why the people I was working with
> were displeased with the implications of the very spread out style of
> some LDC tunings. They were alre
On Fri, 2007-06-22 at 16:21 -0400, Tom Lane wrote:
> Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> > 3. Recovery will take longer, because the distance last committed redo
> > ptr will lag behind more.
>
> True, you'd have to replay 1.5 checkpoint intervals on average instead
> of 0.5 (more o
Simon Riggs wrote:
On Fri, 2007-06-22 at 22:19 +0100, Heikki Linnakangas wrote:
However, I think shortening the checkpoint interval is a perfectly valid
solution to that.
Agreed. That's what checkpoint_timeout is for. Greg can't choose to use
checkpoint_segments as the limit and then complai
On Fri, 2007-06-22 at 22:19 +0100, Heikki Linnakangas wrote:
> However, I think shortening the checkpoint interval is a perfectly valid
> solution to that.
Agreed. That's what checkpoint_timeout is for. Greg can't choose to use
checkpoint_segments as the limit and then complain about unbounded
This message is going to come off as kind of angry, and I hope you don't
take that personally. I'm very frustrated with this whole area right now
but am unable to do anything to improve that situation.
On Fri, 22 Jun 2007, Tom Lane wrote:
If you've got specific evidence why any of these thing
Greg Smith wrote:
On Fri, 22 Jun 2007, Tom Lane wrote:
Greg had worried about being able to turn this behavior off, so we'd
still need at least a bool, and we might as well expose the fraction
instead. I agree with removing the non-LRU part of the bgwriter's
write logic though
If you accep
Greg Smith wrote:
True, you'd have to replay 1.5 checkpoint intervals on average instead
of 0.5 (more or less, assuming checkpoints had been short). I don't
think we're in the business of optimizing crash recovery time though.
If you're not, I think you should be. Keeping that replay interva
On Fri, 22 Jun 2007, Tom Lane wrote:
Greg had worried about being able to turn this behavior off, so we'd
still need at least a bool, and we might as well expose the fraction
instead. I agree with removing the non-LRU part of the bgwriter's write
logic though
If you accept that being able t
Greg Smith <[EMAIL PROTECTED]> writes:
> As the person who was complaining about corner cases I'm not in a position
> to talk more explicitly about, I can at least summarize my opinion of how
> I feel everyone should be thinking about this patch and you can take what
> you want from that.
Sorry
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> Ok, if we approach this from the idea that there will be *no* GUC
> variables at all to control this, and we remove the bgwriter_all_*
> settings as well, does anyone see a reason why that would be bad? Here's
> the ones mentioned this far:
> 1.
On Fri, 22 Jun 2007, Tom Lane wrote:
Yeah, I'm not sure that we've thought through the interactions with the
existing bgwriter behavior.
The entire background writer mess needs a rewrite, and the best way to
handle that is going to shift considerably with LDC applied.
As the person who was
Tom Lane wrote:
Maybe I misread the patch, but I thought that if someone requested an
immediate checkpoint, the checkpoint-in-progress would effectively flip
to immediate mode. So that could be handled by offering an immediate vs
extended checkpoint option in pg_start_backup. I'm not sure it's
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> I still think you've not demonstrated a need to expose this parameter.
> Greg Smith wanted to explicitly control the I/O rate, and let the
> checkpoint duration vary. I personally think that fixing the checkpoint
> duration is b
Tom Lane wrote:
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
Tom Lane wrote:
(BTW, the patch seems
a bit schizoid about whether checkpoint_rate is int or float.)
Yeah, I've gone back and forth on the data type. I wanted it to be a
float, but guc code doesn't let you specify a float in KB,
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> (BTW, the patch seems
>> a bit schizoid about whether checkpoint_rate is int or float.)
> Yeah, I've gone back and forth on the data type. I wanted it to be a
> float, but guc code doesn't let you specify a float in KB, so I swit
Tom Lane wrote:
> And checkpoint_rate really needs to be named checkpoint_min_rate, if
> it's going to be a minimum. However, I question whether we need it at
> all, because as the code stands, with the default BgWriterDelay you
> would have to increase checkpoint_rate to 4x its proposed default b
Tom Lane wrote:
1. checkpoint_rate is used thusly:
writes_per_nap = Min(1, checkpoint_rate / BgWriterDelay);
where writes_per_nap is the max number of dirty blocks to write before
taking a bgwriter nap. Now surely this is completely backward: if
BgWriterDelay is increased, the number o
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> So the question is, why in the heck would anyone want the behavior that
>> "checkpoints take exactly X time"??
> Because it's easier to tune. You don't need to know how much checkpoint
> I/O you can tolerate. The system will use
Tom Lane wrote:
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
The main tuning knob is checkpoint_smoothing, which is defined as a
fraction of the checkpoint interval (both checkpoint_timeout and
checkpoint_segments are taken into account). Normally, the write phase
of a checkpoint takes exact
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> I don't think you understand how the settings work. Did you read the
> documentation? If you did, it's apparently not adequate.
I did read the documentation, and I'm not complaining that I don't
understand it. I'm complaining that I don't like the
Tom Lane wrote:
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
Tom Lane wrote:
I tend to agree with whoever said upthread that the combination of GUC
variables proposed here is confusing and ugly. It'd make more sense to
have min and max checkpoint rates in KB/s, with the max checkpoint rate
o
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> I tend to agree with whoever said upthread that the combination of GUC
>> variables proposed here is confusing and ugly. It'd make more sense to
>> have min and max checkpoint rates in KB/s, with the max checkpoint rate
>> only ho
Tom Lane wrote:
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
In fact, I think there's a small race condition in CVS HEAD:
Yeah, probably --- the original no-locking design didn't have any side
flags. The reason you need the lock is for a backend to be sure that
a newly-started checkpoint is
Tom Lane wrote:
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
I added a spinlock to protect the signaling fields between bgwriter and
backends. The current non-locking approach gets really difficult as the
patch adds two new flags, and both are more important than the existing
ckpt_time_warn
ITAGAKI Takahiro wrote:
The only thing I don't understand is the naming of 'checkpoint_smoothing'.
Can users imagine the unit of 'smoothing' is a fraction?
You explain the paremeter with the word 'fraction'.
Why don't you simply name it 'checkpoint_fraction' ?
| Specifies the target length of ch
Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
> Here's an updated WIP patch for load distributed checkpoints.
> Since last patch, I did some clean up and refactoring, and added a bunch
> of comments, and user documentation.
The only thing I don't understand is the naming of 'checkpoint_smoothing
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> I added a spinlock to protect the signaling fields between bgwriter and
> backends. The current non-locking approach gets really difficult as the
> patch adds two new flags, and both are more important than the existing
> ckpt_time_warn flag.
Tha
Here's an updated WIP patch for load distributed checkpoints.
I added a spinlock to protect the signaling fields between bgwriter and
backends. The current non-locking approach gets really difficult as the
patch adds two new flags, and both are more important than the existing
ckpt_time_warn f
63 matches
Mail list logo