On Wed, Apr 13, 2011 at 2:36 PM, Richard Yen <[email protected]> wrote:
> On Wed, Apr 13, 2011 at 11:01 AM, Vick Khera <[email protected]> wrote:
>>
>> On Wed, Apr 13, 2011 at 1:42 PM, Richard Yen <[email protected]>
>> wrote:
>>>
>>> Well, I noticed that when the log gets large and it's in the middle of a
>>> logswitch, load on the origin node will increase and subsequent to that, all
>>> subscriber nodes will lag up to 900sec.  This seems troublesome, considering
>>> that nodes in my cluster typically don't lag for more than 10sec--it's only
>>> during these logswitch events that they lag by so much.
>>
>> I never noticed that, even when I had my db's on spinning media.  It was
>> never correlated with log switch.  The only times I got lag was when I had a
>> *lot* of update/insert activity in a very very short period.
>> Might I suggest that the cause/effect is the other way around?  Perhaps
>> you are just hitting your I/O throughput limit for your hardware.
>
> That sounds like a possibility to me, but as far as I understand it, when a
> subscriber is lagging, events in sl_event (and therefore sl_log_*) are not
> being processed.  Wouldn't the slave lag block a logswitch from finishing?
>  Or perhaps I don't fully understand the way SYNCs are processed and
> purged...

Yep, any time there is lag on *any* node, that will start to prevent
logswitches from completing.

The completion always waits on having confirmations back from all the
other nodes, so, if you've got an 8h subscription running, it won't
matter how often you try to switch logs - once that subscription
starts to be processed, the proposals to truncate will all tend to
stall until that subscription completes.

There is merit to keeping the time interval controlled by
"cleanup_interval" pretty short; that's what determines how often slon
considers logswitching.  But there's little point to setting it
*really* short when lags will prevent the switch from completing.  I'd
think it futile to consider values less than a couple of minutes,
myself.

It's possible that the fix for bug #167
<http://www.slony.info/bugzilla/show_bug.cgi?id=167> would address the
load you're seeing; the queries that access sl_log_* presently have
the tendency to do Seq Scans when it ought to be possible to use
indices.  (I'll not re-explain what Jan has described in the bug.)

Jan seems to be keen to backpatch that, which Steve and I weren't
enthralled with basically on the basis that we don't think it's seen
enough usage yet.  But if someone *did* do some testing of 167 on 2.0,
that would definitely answer our concern.
_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general

Reply via email to