On Tue, Jun 11, 2019 at 3:44 PM Tory M Blue <tmb...@gmail.com> wrote:
> > > On Tue, Jun 11, 2019 at 12:26 PM Christopher Browne <cbbro...@afilias.info> > wrote: > >> >> >> On Tue, Jun 11, 2019 at 1:30 PM Tory M Blue <tmb...@gmail.com> wrote: >> >>> I'm wondering if there is anything I can do to prevent what appears to >>> be slony replication backup (not replicating) during a pg_dump. I'm using a >>> -N slonschema, so wondering if there is any other techniques to prevent >>> slon replication delay during pg_dumps? >>> >> >> Hmm. I remember the "big deal" being to ensure that the Slony schema was >> left out, as that would definitely lead to locking that would cause >> replication to fall behind. >> >> It's possible that something new has come along that would lock in a >> troublesome way. >> >> I suggest that you take a look in pg_catalog.pg_locks to see what objects >> Slony is waiting on, thus, looking for: >> >> select * from pg_catalog.pg_locks where not granted; >> >> Some further work needs to be done to trace that to exactly what objects >> are locked. >> > > Ya will take a look, I also thought that bypassing the slon schema would > solve it but apparently not, we can have 2-10million count in the sl_log > before it clears and it's apparent that the replication is not happening, > so I'll poke around some more. Glad to see folks are still around :) > Well, growth of sl_log during the backup is fine, and not an indication of anything untowards. Given that you opened a transaction when pg_dump began, and that has not completed, that will lead to Slony not trimming anything out of sl_log during the backup. That's normal, routine stuff. Nothing's wrong (not on the basis of that, anyways). On the origin node, you can query the view sl_status, which will show how far behind replication appears to be. (The view is available on all nodes; it does not return terribly interesting results on nodes other than the origin.) Here's an example of replication being very far behind: usaorigin=# select * from _usa.sl_status limit 3; -[ RECORD 1 ]-------------+------------------------------ st_origin | 1 st_received | 3 st_last_event | 5000002124 st_last_event_ts | 2019-04-10 20:02:43.833414+00 st_last_received | 5000002124 st_last_received_ts | 2019-04-10 20:02:43.885806+00 st_last_received_event_ts | 2019-04-10 20:02:43.833414+00 st_lag_num_events | 0 st_lag_time | 62 days 20:23:14.029816 -[ RECORD 2 ]-------------+------------------------------ st_origin | 1 st_received | 4 st_last_event | 5000002124 st_last_event_ts | 2019-04-10 20:02:43.833414+00 st_last_received | 5000002122 st_last_received_ts | 2019-04-10 20:02:33.870577+00 st_last_received_event_ts | 2019-04-10 20:02:23.76373+00 st_lag_num_events | 2 st_lag_time | 62 days 20:23:34.0995 -[ RECORD 3 ]-------------+------------------------------ st_origin | 1 st_received | 5 st_last_event | 5000002124 st_last_event_ts | 2019-04-10 20:02:43.833414+00 st_last_received | 5000002122 st_last_received_ts | 2019-04-10 20:02:33.898605+00 st_last_received_event_ts | 2019-04-10 20:02:23.76373+00 st_lag_num_events | 2 st_lag_time | 62 days 20:23:34.0995 That is a replication cluster where I shut down slon processes back in April :-) It's only behind by a few events, for the self-same reason :-) If the last event received coincides with the time at which you started the pg_dump run, well, that sure sounds like a smoking gun. If, on the other hand, SYNC events are being processed, just at slower speed than you'd like, then perhaps the problem is that the I/O induced by pg_dump is delaying replication, and that's more a hardware problem than a software one. And if replication is a few seconds behind, while the system is busy, as indicated by st_lag_time being a few seconds, then it's possible that there is no problem there.
_______________________________________________ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general