On 07/12/2016 08:23 AM, Steve Singer wrote:
> On 07/08/2016 03:27 PM, Tignor, Tom wrote:
>>                   Hello slony group,
>>
>>                   I’m testing now with slony1-2.2.4. I have just recently
>> produced an error which effectively stops slon processing on some node A
>> due to some node B being dropped. The event reproduces only
>> infrequently. As some will know, a slon daemon for a given node which
>> becomes aware its node has been dropped will respond by dropping its
>> cluster schema. There appears to be a race condition between the node B
>> schema drop and the (surviving) node A receipt of the disableNode (drop
>> node) event. If the former occurs before the latter, all the remote
>> worker threads on node A enter an error state. See the log samples
>> below. I resolved this the first time by deleting all the recent
>> non-SYNC events from the sl_event tables, and more recently with a
>> simple node A slon restart.
>>
>>                   Please advise if there is any ticket I should provide
>> this info to, or if I should create a new one. Thanks.
>>
>
> The Slony bug tracker is at
> http://bugs.slony.info/bugzilla/
>
>
> I assume your saying that when the slon restart it keeps hitting this
> error and keeps restarting.
>

Any more hints on how you reproduce this would be helpful.
I've been trying to reproduce this with no luck.

At the time you issue the drop node, are all other nodes caught up to 
the drop'd node (and the event node) with respect to configuration events?

Ie if you do
sync(id=$NODE_ABOUT_TO_DROP);
wait for event(wait on=$NODE_ABOUT_TO_DROP, origin=$NODE_ABOUT_TO_DROP, 
confirmed=all);

sync(id=$EVENT_NODE_FOR_DROP);
wait for event(wait on=$EVENT_NODE_FOR_DROP, 
origin=$EVENT_NODE_FOR_DROP, confirmed=all);


(notice that I am NOT passing any timeout to the wait for).





>
>
>> ---- node 1 log ----
>>
>> 2016-07-08 18:06:31 UTC [30382] INFO   remoteWorkerThread_999999: SYNC
>> 5000000008 done in 0.002 seconds
>>
>> 2016-07-08 18:06:33 UTC [30382] INFO   remoteWorkerThread_999999: SYNC
>> 5000000009 done in 0.002 seconds
>>
>> 2016-07-08 18:06:33 UTC [30382] INFO   remoteWorkerThread_2: SYNC
>> 5000017869 done in 0.002 seconds
>>
>> 2016-07-08 18:06:33 UTC [30382] INFO   remoteWorkerThread_3: SYNC
>> 5000018148 done in 0.004 seconds
>>
>> 2016-07-08 18:06:45 UTC [30382] CONFIG remoteWorkerThread_2: update
>> provider configuration
>>
>> 2016-07-08 18:06:45 UTC [30382] ERROR  remoteWorkerThread_3: "select
>> last_value from "_ams_cluster".sl_log_status" PGRES_FATAL_ERROR ERROR:
>> schema "_ams_clu\
>>
>> ster" does not exist
>>
>> LINE 1: select last_value from "_ams_cluster".sl_log_status
>>
>>                                  ^
>>
>> 2016-07-08 18:06:45 UTC [30382] ERROR  remoteWorkerThread_3: SYNC aborted
>>
>> 2016-07-08 18:06:45 UTC [30382] CONFIG version for "dbname=ams
>>
>>         host=198.18.102.45
>>
>>         user=ams_slony
>>
>>         sslmode=verify-ca
>>
>>         sslcert=/usr/local/akamai/.ams_certs/complete-ams_slony.crt
>>
>>         sslkey=/usr/local/akamai/.ams_certs/ams_slony.private_key
>>
>>         sslrootcert=/usr/local/akamai/etc/ssl_ca/canonical_ca_roots.pem"
>> is 90119
>>
>> 2016-07-08 18:06:45 UTC [30382] ERROR  remoteWorkerThread_2: "select
>> last_value from "_ams_cluster".sl_log_status" PGRES_FATAL_ERROR ERROR:
>> schema "_ams_clu\
>>
>> ster" does not exist
>>
>> LINE 1: select last_value from "_ams_cluster".sl_log_status
>>
>>                                  ^
>>
>> 2016-07-08 18:06:45 UTC [30382] ERROR  remoteWorkerThread_2: SYNC aborted
>>
>> 2016-07-08 18:06:45 UTC [30382] ERROR  remoteListenThread_999999:
>> "select ev_origin, ev_seqno, ev_timestamp,        ev_snapshot,
>> "pg_catalog".txid_sna\
>>
>> pshot_xmin(ev_snapshot),
>> "pg_catalog".txid_snapshot_xmax(ev_snapshot),        ev_type,
>>    ev_data1, ev_data2,        ev_data3, ev_data4,        ev\
>>
>> _data5, ev_data6,        ev_data7, ev_data8 from "_ams_cluster".sl_event
>> e where (e.ev_origin = '999999' and e.ev_seqno > '5000000009') or
>> (e.ev_origin = '2'\
>>
>> and e.ev_seqno > '5000017870') or (e.ev_origin = '3' and e.ev_seqno >
>> '5000018151') order by e.ev_origin, e.ev_seqno limit 40" - ERROR:
>> schema "_ams_cluste\
>>
>> r" does not exist
>>
>> LINE 1: ...v_data5, ev_data6,        ev_data7, ev_data8 from "_ams_clus...
>>
>>                                                                ^
>>
>> 2016-07-08 18:06:55 UTC [30382] ERROR  remoteWorkerThread_3: "start
>> transaction; set enable_seqscan = off; set enable_indexscan = on; "
>> PGRES_FATAL_ERROR ERR\
>>
>> OR:  current transaction is aborted, commands ignored until end of
>> transaction block
>>
>> 2016-07-08 18:06:55 UTC [30382] ERROR  remoteWorkerThread_3: SYNC aborted
>>
>> 2016-07-08 18:06:55 UTC [30382] ERROR  remoteWorkerThread_2: "start
>> transaction; set enable_seqscan = off; set enable_indexscan = on; "
>> PGRES_FATAL_ERROR ERR\
>>
>> OR:  current transaction is aborted, commands ignored until end of
>> transaction block
>>
>> 2016-07-08 18:06:55 UTC [30382] ERROR  remoteWorkerThread_2: SYNC aborted
>>
>> ----
>>
>> ---- node 999999 log ----
>>
>> 2016-07-08 18:06:44 UTC [558] INFO   remoteWorkerThread_1: SYNC
>> 5000081216 done in 0.004 seconds
>>
>> 2016-07-08 18:06:44 UTC [558] INFO   remoteWorkerThread_2: SYNC
>> 5000017870 done in 0.004 seconds
>>
>> 2016-07-08 18:06:44 UTC [558] INFO   remoteWorkerThread_3: SYNC
>> 5000018150 done in 0.004 seconds
>>
>> 2016-07-08 18:06:44 UTC [558] INFO   remoteWorkerThread_1: SYNC
>> 5000081217 done in 0.003 seconds
>>
>> 2016-07-08 18:06:44 UTC [558] WARN   remoteWorkerThread_3: got DROP NODE
>> for local node ID
>>
>> NOTICE:  Slony-I: Please drop schema "_ams_cluster"
>>
>> NOTICE:  drop cascades to 171 other objects
>>
>> DETAIL:  drop cascades to table _ams_cluster.sl_node
>>
>> drop cascades to table _ams_cluster.sl_nodelock
>>
>> drop cascades to table _ams_cluster.sl_set
>>
>> drop cascades to table _ams_cluster.sl_setsync
>>
>> drop cascades to table _ams_cluster.sl_table
>>
>> ----
>>
>>               Tom J
>>
>>
>>
>> _______________________________________________
>> Slony1-general mailing list
>> Slony1-general@lists.slony.info
>> http://lists.slony.info/mailman/listinfo/slony1-general
>>
>
> _______________________________________________
> Slony1-general mailing list
> Slony1-general@lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general
>

_______________________________________________
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general

Reply via email to