> From: Steve Singer <[email protected]> > On 11-09-08 11:43 AM, Glyn Astill wrote: >> >> SELECT st_origin, st_received, st_lag_num_events, round(extract(epoch > from st_lag_time)) >> FROM "<my_replication_cluster>".sl_status; >> >> A graph for the weeks leading up to and after the upgrade is attached. I > upgraded on the night of the 25th/26th and ignoring any other downtime where > I > was obviously fiddling with things, you can see the syncs going out after > that > date. As you can imagine, I'm massively embarrassed that it took me 3 > months to notice it happening. >> > > st_lag_time is a measure of the difference between now() and the last > unconfirmed event. The pg_dump locks sl_event which prevents the SYNC's > from being created so there might not be any unconfirmed events to be > measured > by this check. > > > Sometime between 2.0.4 and 2.0.6 we fixed a bug that prevented SYNC events > from > being generated from pure slaves. I suspect your check is now measuring the > other half of replication (if you do your select from sl_status you should > see > at least two rows, it isn't clear if your graphing both of them or just > one). > > If now()-st_last_event_ts gets too high it means that SYNC events are not > being > generated. You might want to alert on both SYNC events not being generated > and > events not being confirmed. >
Okay, you know better than me. However I'm positive that when we were on 1.2 and I was in overnight our slaves were up to date whilst the backups were running, it's only circumstansial of course, but pretty sure I'd have noticed in 3 years if not as I'd query those slaves all the time. I've excluded the slony scchema from the dump now, so we're all good anyway. Thanks Glyn _______________________________________________ Slony1-general mailing list [email protected] http://lists.slony.info/mailman/listinfo/slony1-general
