On 06/27/2017 11:59 AM, Tignor, Tom wrote:
The disableNode() in the makes it look like someone did a DROP NODE If the only issue is that your missing active paths in sl_path you can add/update the paths with slonik. > ** > > **Hello Slony-I community, > > Hoping someone can advise on a strange and serious problem. > We performed a slony service failover yesterday. For the first time > ever, our slony service FAILOVER op errored out. We recently expanded > our cluster to 7 consumers from a single provider. There are no load > issues during normal operations. As the error output below shows, > though, our node 4 and node 5 consumers never got the events they > needed. Here’s where it gets weird: closer inspection has shown that > node 2->4 and node 2->5 path data went missing out of the service at > some point. It seems clear that’s the main issue, but in spite of that, > both node 4 and node 5 continued to find and process node 2 SYNC events > for a full week! The logs show this happened in spite of multiple restarts. > > How can this happen? If missing path data stymies the failover, wouldn’t > it also prevent normal SYNC processing? > > In the case where a failover is begun with inadequate path data, what’s > the best resolution? Can path data be quickly applied to allow failover > to succeed? > > Thanks in advance for any insights. > > ---- failover error ---- > > /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: NOTICE: > calling restart node 1 > > /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:55: > 2017-06-26 18:33:02 > > executing preFailover(1,1) on 2 > > executing preFailover(1,1) on 3 > > executing preFailover(1,1) on 4 > > executing preFailover(1,1) on 5 > > executing preFailover(1,1) on 6 > > executing preFailover(1,1) on 7 > > executing preFailover(1,1) on 8 > > NOTICE: executing "_ams_cluster".failedNode2 on node 2 > > /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting > for event (2,5000061664). node 8 only on event 5000061654, node 4 only > on event 5000061654, node 5 only on event 5000061655, node 3 only on > event 5000061662, node 6\ > > only on event 5000061654, node 7 only on event 5000061656 > > /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting > for event (2,5000061664). node 4 only on event 5000061657, node 5 only > on event 5000061663, node 3 only on event 5000061663, node 6 only on > event 5000061663 > > /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting > for event (2,5000061664). node 4 only on event 5000061663, node 5 only > on event 5000061663, node 6 only on event 5000061663 > > /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting > for event (2,5000061664). node 4 only on event 5000061663, node 5 only > on event 5000061663 > > /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting > for event (2,5000061664). node 4 only on event 5000061663, node 5 only > on event 5000061663 > > /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting > for event (2,5000061664). node 4 only on event 5000061663, node 5 only > on event 5000061663 > > /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting > for event (2,5000061664). node 4 only on event 5000061663, node 5 only > on event 5000061663 > > /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting > for event (2,5000061664). node 4 only on event 5000061663, node 5 only > on event 5000061663 > > /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting > for event (2,5000061664). node 4 only on event 5000061663, node 5 only > on event 5000061663 > > /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting > for event (2,5000061664). node 4 only on event 5000061663, node 5 only > on event 5000061663 > > /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting > for event (2,5000061664). node 4 only on event 5000061663, node 5 only > on event 5000061663 > > /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting > for event (2,5000061664). node 4 only on event 5000061663, node 5 only > on event 5000061663 > > /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting > for event (2,5000061664). node 4 only on event 5000061663, node 5 only > on event 5000061663 > > /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting > for event (2,5000061664). node 4 only on event 5000061663, node 5 only > on event 5000061663 > > ---- node 4 log archive ---- > > bos-mpt5c:odin-9353 ttignor$ egrep 'disableNode: no_id=2|storePath: > pa_server=2 pa_client=4|restart notification' prod4/node4-pathconfig.out > > 2017-06-15 15:14:00 UTC [5688] INFO localListenThread: got restart > notification > > 2017-06-15 15:14:10 UTC [8431] CONFIG storePath: pa_server=2 pa_client=4 > pa_conninfo="dbname=ams > > 2017-06-15 15:53:00 UTC [8431] INFO localListenThread: got restart > notification > > 2017-06-15 15:53:10 UTC [23701] CONFIG storePath: pa_server=2 > pa_client=4 pa_conninfo="dbname=ams > > 2017-06-16 17:29:13 UTC [10253] CONFIG storePath: pa_server=2 > pa_client=4 pa_conninfo="dbname=ams > > 2017-06-16 20:43:42 UTC [2707] CONFIG storePath: pa_server=2 pa_client=4 > pa_conninfo="dbname=ams > > 2017-06-19 15:11:45 UTC [2707] CONFIG disableNode: no_id=2 > > 2017-06-19 15:11:45 UTC [2707] INFO localListenThread: got restart > notification > > 2017-06-20 18:40:15 UTC [31224] INFO localListenThread: got restart > notification > > 2017-06-21 14:31:42 UTC [6253] INFO localListenThread: got restart > notification > > 2017-06-21 14:35:26 UTC [32367] INFO localListenThread: got restart > notification > > 2017-06-26 18:21:25 UTC [9278] INFO localListenThread: got restart > notification > > 2017-06-26 18:33:04 UTC [28839] INFO localListenThread: got restart > notification > > 2017-06-26 18:33:30 UTC [1785] INFO localListenThread: got restart > notification > > bos-mpt5c:odin-9353 ttignor$ > > ---- node 5 log archive ---- > > bos-mpt5c:odin-9353 ttignor$ egrep 'disableNode: no_id=2|storePath: > pa_server=2 pa_client=5|restart notification' prod5/node5-pathconfig.out > > 2017-06-15 15:13:56 UTC [20700] INFO localListenThread: got restart > notification > > 2017-06-15 15:14:06 UTC [20374] CONFIG storePath: pa_server=2 > pa_client=5 pa_conninfo="dbname=ams > > 2017-06-15 15:53:01 UTC [20374] INFO localListenThread: got restart > notification > > 2017-06-15 15:53:11 UTC [2859] CONFIG storePath: pa_server=2 pa_client=5 > pa_conninfo="dbname=ams > > 2017-06-16 17:28:19 UTC [2859] INFO localListenThread: got restart > notification > > 2017-06-16 17:28:29 UTC [10753] CONFIG storePath: pa_server=2 > pa_client=5 pa_conninfo="dbname=ams > > 2017-06-19 15:11:40 UTC [10753] CONFIG disableNode: no_id=2 > > 2017-06-19 15:11:40 UTC [10753] INFO localListenThread: got restart > notification > > 2017-06-20 18:40:11 UTC [450] INFO localListenThread: got restart > notification > > 2017-06-21 14:31:41 UTC [22300] INFO localListenThread: got restart > notification > > 2017-06-21 14:35:28 UTC [26777] INFO localListenThread: got restart > notification > > 2017-06-26 18:21:27 UTC [28366] INFO localListenThread: got restart > notification > > 2017-06-26 18:33:04 UTC [29345] INFO localListenThread: got restart > notification > > 2017-06-26 18:33:27 UTC [1299] INFO localListenThread: got restart > notification > > bos-mpt5c:odin-9353 ttignor$ > > Tom ☺ > > > > _______________________________________________ > Slony1-general mailing list > Slony1-general@lists.slony.info > http://lists.slony.info/mailman/listinfo/slony1-general > _______________________________________________ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general