On 06/27/2017 11:59 AM, Tignor, Tom wrote:

The disableNode() in the makes it look like someone did a DROP NODE

If the only issue is that your missing active paths in sl_path you can 
add/update the paths with slonik.




> **
>
> **Hello Slony-I community,
>
>              Hoping someone can advise on a strange and serious problem.
> We performed a slony service failover yesterday. For the first time
> ever, our slony service FAILOVER op errored out. We recently expanded
> our cluster to 7 consumers from a single provider. There are no load
> issues during normal operations. As the error output below shows,
> though, our node 4 and node 5 consumers never got the events they
> needed. Here’s where it gets weird: closer inspection has shown that
> node 2->4 and node 2->5 path data went missing out of the service at
> some point. It seems clear that’s the main issue, but in spite of that,
> both node 4 and node 5 continued to find and process node 2 SYNC events
> for a full week! The logs show this happened in spite of multiple restarts.
>
> How can this happen? If missing path data stymies the failover, wouldn’t
> it also prevent normal SYNC processing?
>
> In the case where a failover is begun with inadequate path data, what’s
> the best resolution? Can path data be quickly applied to allow failover
> to succeed?
>
>              Thanks in advance for any insights.
>
> ---- failover error ----
>
> /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: NOTICE:
> calling restart node 1
>
> /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:55:
> 2017-06-26 18:33:02
>
> executing preFailover(1,1) on 2
>
> executing preFailover(1,1) on 3
>
> executing preFailover(1,1) on 4
>
> executing preFailover(1,1) on 5
>
> executing preFailover(1,1) on 6
>
> executing preFailover(1,1) on 7
>
> executing preFailover(1,1) on 8
>
> NOTICE: executing "_ams_cluster".failedNode2 on node 2
>
> /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting
> for event (2,5000061664).  node 8 only on event 5000061654, node 4 only
> on event 5000061654, node 5 only on event 5000061655, node 3 only on
> event 5000061662, node 6\
>
>   only on event 5000061654, node 7 only on event 5000061656
>
> /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting
> for event (2,5000061664).  node 4 only on event 5000061657, node 5 only
> on event 5000061663, node 3 only on event 5000061663, node 6 only on
> event 5000061663
>
> /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting
> for event (2,5000061664).  node 4 only on event 5000061663, node 5 only
> on event 5000061663, node 6 only on event 5000061663
>
> /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting
> for event (2,5000061664).  node 4 only on event 5000061663, node 5 only
> on event 5000061663
>
> /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting
> for event (2,5000061664).  node 4 only on event 5000061663, node 5 only
> on event 5000061663
>
> /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting
> for event (2,5000061664).  node 4 only on event 5000061663, node 5 only
> on event 5000061663
>
> /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting
> for event (2,5000061664).  node 4 only on event 5000061663, node 5 only
> on event 5000061663
>
> /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting
> for event (2,5000061664).  node 4 only on event 5000061663, node 5 only
> on event 5000061663
>
> /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting
> for event (2,5000061664).  node 4 only on event 5000061663, node 5 only
> on event 5000061663
>
> /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting
> for event (2,5000061664).  node 4 only on event 5000061663, node 5 only
> on event 5000061663
>
> /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting
> for event (2,5000061664).  node 4 only on event 5000061663, node 5 only
> on event 5000061663
>
> /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting
> for event (2,5000061664).  node 4 only on event 5000061663, node 5 only
> on event 5000061663
>
> /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting
> for event (2,5000061664).  node 4 only on event 5000061663, node 5 only
> on event 5000061663
>
> /tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting
> for event (2,5000061664).  node 4 only on event 5000061663, node 5 only
> on event 5000061663
>
> ---- node 4 log archive ----
>
> bos-mpt5c:odin-9353 ttignor$ egrep 'disableNode: no_id=2|storePath:
> pa_server=2 pa_client=4|restart notification' prod4/node4-pathconfig.out
>
> 2017-06-15 15:14:00 UTC [5688] INFO   localListenThread: got restart
> notification
>
> 2017-06-15 15:14:10 UTC [8431] CONFIG storePath: pa_server=2 pa_client=4
> pa_conninfo="dbname=ams
>
> 2017-06-15 15:53:00 UTC [8431] INFO   localListenThread: got restart
> notification
>
> 2017-06-15 15:53:10 UTC [23701] CONFIG storePath: pa_server=2
> pa_client=4 pa_conninfo="dbname=ams
>
> 2017-06-16 17:29:13 UTC [10253] CONFIG storePath: pa_server=2
> pa_client=4 pa_conninfo="dbname=ams
>
> 2017-06-16 20:43:42 UTC [2707] CONFIG storePath: pa_server=2 pa_client=4
> pa_conninfo="dbname=ams
>
> 2017-06-19 15:11:45 UTC [2707] CONFIG disableNode: no_id=2
>
> 2017-06-19 15:11:45 UTC [2707] INFO   localListenThread: got restart
> notification
>
> 2017-06-20 18:40:15 UTC [31224] INFO   localListenThread: got restart
> notification
>
> 2017-06-21 14:31:42 UTC [6253] INFO   localListenThread: got restart
> notification
>
> 2017-06-21 14:35:26 UTC [32367] INFO   localListenThread: got restart
> notification
>
> 2017-06-26 18:21:25 UTC [9278] INFO   localListenThread: got restart
> notification
>
> 2017-06-26 18:33:04 UTC [28839] INFO   localListenThread: got restart
> notification
>
> 2017-06-26 18:33:30 UTC [1785] INFO   localListenThread: got restart
> notification
>
> bos-mpt5c:odin-9353 ttignor$
>
> ---- node 5 log archive ----
>
> bos-mpt5c:odin-9353 ttignor$ egrep 'disableNode: no_id=2|storePath:
> pa_server=2 pa_client=5|restart notification' prod5/node5-pathconfig.out
>
> 2017-06-15 15:13:56 UTC [20700] INFO   localListenThread: got restart
> notification
>
> 2017-06-15 15:14:06 UTC [20374] CONFIG storePath: pa_server=2
> pa_client=5 pa_conninfo="dbname=ams
>
> 2017-06-15 15:53:01 UTC [20374] INFO   localListenThread: got restart
> notification
>
> 2017-06-15 15:53:11 UTC [2859] CONFIG storePath: pa_server=2 pa_client=5
> pa_conninfo="dbname=ams
>
> 2017-06-16 17:28:19 UTC [2859] INFO   localListenThread: got restart
> notification
>
> 2017-06-16 17:28:29 UTC [10753] CONFIG storePath: pa_server=2
> pa_client=5 pa_conninfo="dbname=ams
>
> 2017-06-19 15:11:40 UTC [10753] CONFIG disableNode: no_id=2
>
> 2017-06-19 15:11:40 UTC [10753] INFO   localListenThread: got restart
> notification
>
> 2017-06-20 18:40:11 UTC [450] INFO   localListenThread: got restart
> notification
>
> 2017-06-21 14:31:41 UTC [22300] INFO   localListenThread: got restart
> notification
>
> 2017-06-21 14:35:28 UTC [26777] INFO   localListenThread: got restart
> notification
>
> 2017-06-26 18:21:27 UTC [28366] INFO   localListenThread: got restart
> notification
>
> 2017-06-26 18:33:04 UTC [29345] INFO   localListenThread: got restart
> notification
>
> 2017-06-26 18:33:27 UTC [1299] INFO   localListenThread: got restart
> notification
>
> bos-mpt5c:odin-9353 ttignor$
>
>              Tom ☺
>
>
>
> _______________________________________________
> Slony1-general mailing list
> Slony1-general@lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general
>

_______________________________________________
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general

Reply via email to