Hello slony group,
I’m testing now with slony1-2.2.4. I have just recently
produced an error which effectively stops slon processing on some node A due to
some node B being dropped. The event reproduces only infrequently. As some will
know, a slon daemon for a given node which becomes aware its node has been
dropped will respond by dropping its cluster schema. There appears to be a race
condition between the node B schema drop and the (surviving) node A receipt of
the disableNode (drop node) event. If the former occurs before the latter, all
the remote worker threads on node A enter an error state. See the log samples
below. I resolved this the first time by deleting all the recent non-SYNC
events from the sl_event tables, and more recently with a simple node A slon
restart.
Please advise if there is any ticket I should provide this info
to, or if I should create a new one. Thanks.
---- node 1 log ----
2016-07-08 18:06:31 UTC [30382] INFO remoteWorkerThread_999999: SYNC
5000000008 done in 0.002 seconds
2016-07-08 18:06:33 UTC [30382] INFO remoteWorkerThread_999999: SYNC
5000000009 done in 0.002 seconds
2016-07-08 18:06:33 UTC [30382] INFO remoteWorkerThread_2: SYNC 5000017869
done in 0.002 seconds
2016-07-08 18:06:33 UTC [30382] INFO remoteWorkerThread_3: SYNC 5000018148
done in 0.004 seconds
2016-07-08 18:06:45 UTC [30382] CONFIG remoteWorkerThread_2: update provider
configuration
2016-07-08 18:06:45 UTC [30382] ERROR remoteWorkerThread_3: "select last_value
from "_ams_cluster".sl_log_status" PGRES_FATAL_ERROR ERROR: schema "_ams_clu\
ster" does not exist
LINE 1: select last_value from "_ams_cluster".sl_log_status
^
2016-07-08 18:06:45 UTC [30382] ERROR remoteWorkerThread_3: SYNC aborted
2016-07-08 18:06:45 UTC [30382] CONFIG version for "dbname=ams
host=198.18.102.45
user=ams_slony
sslmode=verify-ca
sslcert=/usr/local/akamai/.ams_certs/complete-ams_slony.crt
sslkey=/usr/local/akamai/.ams_certs/ams_slony.private_key
sslrootcert=/usr/local/akamai/etc/ssl_ca/canonical_ca_roots.pem" is 90119
2016-07-08 18:06:45 UTC [30382] ERROR remoteWorkerThread_2: "select last_value
from "_ams_cluster".sl_log_status" PGRES_FATAL_ERROR ERROR: schema "_ams_clu\
ster" does not exist
LINE 1: select last_value from "_ams_cluster".sl_log_status
^
2016-07-08 18:06:45 UTC [30382] ERROR remoteWorkerThread_2: SYNC aborted
2016-07-08 18:06:45 UTC [30382] ERROR remoteListenThread_999999: "select
ev_origin, ev_seqno, ev_timestamp, ev_snapshot,
"pg_catalog".txid_sna\
pshot_xmin(ev_snapshot), "pg_catalog".txid_snapshot_xmax(ev_snapshot),
ev_type, ev_data1, ev_data2, ev_data3, ev_data4, ev\
_data5, ev_data6, ev_data7, ev_data8 from "_ams_cluster".sl_event e
where (e.ev_origin = '999999' and e.ev_seqno > '5000000009') or (e.ev_origin =
'2'\
and e.ev_seqno > '5000017870') or (e.ev_origin = '3' and e.ev_seqno >
'5000018151') order by e.ev_origin, e.ev_seqno limit 40" - ERROR: schema
"_ams_cluste\
r" does not exist
LINE 1: ...v_data5, ev_data6, ev_data7, ev_data8 from "_ams_clus...
^
2016-07-08 18:06:55 UTC [30382] ERROR remoteWorkerThread_3: "start
transaction; set enable_seqscan = off; set enable_indexscan = on; "
PGRES_FATAL_ERROR ERR\
OR: current transaction is aborted, commands ignored until end of transaction
block
2016-07-08 18:06:55 UTC [30382] ERROR remoteWorkerThread_3: SYNC aborted
2016-07-08 18:06:55 UTC [30382] ERROR remoteWorkerThread_2: "start
transaction; set enable_seqscan = off; set enable_indexscan = on; "
PGRES_FATAL_ERROR ERR\
OR: current transaction is aborted, commands ignored until end of transaction
block
2016-07-08 18:06:55 UTC [30382] ERROR remoteWorkerThread_2: SYNC aborted
----
---- node 999999 log ----
2016-07-08 18:06:44 UTC [558] INFO remoteWorkerThread_1: SYNC 5000081216 done
in 0.004 seconds
2016-07-08 18:06:44 UTC [558] INFO remoteWorkerThread_2: SYNC 5000017870 done
in 0.004 seconds
2016-07-08 18:06:44 UTC [558] INFO remoteWorkerThread_3: SYNC 5000018150 done
in 0.004 seconds
2016-07-08 18:06:44 UTC [558] INFO remoteWorkerThread_1: SYNC 5000081217 done
in 0.003 seconds
2016-07-08 18:06:44 UTC [558] WARN remoteWorkerThread_3: got DROP NODE for
local node ID
NOTICE: Slony-I: Please drop schema "_ams_cluster"
NOTICE: drop cascades to 171 other objects
DETAIL: drop cascades to table _ams_cluster.sl_node
drop cascades to table _ams_cluster.sl_nodelock
drop cascades to table _ams_cluster.sl_set
drop cascades to table _ams_cluster.sl_setsync
drop cascades to table _ams_cluster.sl_table
----
Tom ☺
_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general