> From: Steve Singer <[email protected]>
> On 11-09-08 11:43 AM, Glyn Astill wrote: 
>> 
>>       SELECT st_origin, st_received, st_lag_num_events, round(extract(epoch 
> from st_lag_time))
>>       FROM "<my_replication_cluster>".sl_status;
>> 
>>  A graph for the weeks leading up to and after the upgrade is attached.  I 
> upgraded on the night of the 25th/26th and ignoring any other downtime where 
> I 
> was obviously fiddling with things, you can see the syncs going out after 
> that 
> date.  As you can imagine, I'm massively embarrassed that it took me 3 
> months to notice it happening.
>> 
> 
> st_lag_time is a measure of the difference between now() and the last 
> unconfirmed event.  The pg_dump locks sl_event which prevents the SYNC's 
> from being created so there might not be any unconfirmed events to be 
> measured 
> by this check.
> 
> 
> Sometime between 2.0.4 and 2.0.6 we fixed a bug that prevented SYNC events 
> from 
> being generated from pure slaves. I suspect your check is now measuring the 
> other half of replication (if you do your select from sl_status you should 
> see 
> at least two rows, it isn't clear if your graphing both of them or just 
> one).
> 
> If  now()-st_last_event_ts gets too high it means that SYNC events are not 
> being 
> generated.  You might want to alert on both SYNC events not being generated 
> and 
> events not being confirmed.
> 

Okay, you know better than me.  However I'm positive that when we were on 1.2 
and I was in overnight our slaves were up to date whilst the backups were 
running, it's only circumstansial of course, but pretty sure I'd have noticed 
in 3 years if not as I'd query those slaves all the time.

I've excluded the slony scchema from the dump now, so we're all good anyway.  

Thanks
Glyn

_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general

Reply via email to