Hello guys, So I have this setup that has already stopped on me 3 times the last 6 months. Each time it would replicate properly for 2-3 months and then it would just stop. It currently is stopped since January 11, 2016. The only way I can get replication back is to set everything up from scratch. I'm wondering if anyone has an idea on the issue causing the stoppage. I'm running 64-bit slony 2.2.4.
Currently, when I run slon on the replicated machine, I get the following: C:\Program Files\PostgreSQL\9.3\bin>slon slony_Securithor2 "dbname = Securithor2 user = slonyuser password = securiTHOR971 port = 6234" 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: slon version 2.2.4 starting up 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option vac_frequenc y = 3 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option log_level = 0 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option sync_interva l = 2000 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option sync_interva l_timeout = 10000 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option sync_group_m axsize = 20 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option quit_sync_pr ovider = 0 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option remote_liste n_timeout = 300 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option monitor_inte rval = 500 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option explain_inte rval = 0 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option tcp_keepaliv e_idle = 0 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option tcp_keepaliv e_interval = 0 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option tcp_keepaliv e_count = 0 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option apply_cache_ size = 100 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Boolean option log_pid = 0 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Boolean option log_timestam p = 1 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Boolean option tcp_keepaliv e = 1 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Boolean option monitor_thre ads = 1 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Real option real_placeholde r = 0.000000 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option cluster_name = slony_Securithor2 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option conn_info = d bname = Securithor2 user = slonyuser password = securiTHOR971 port = 6234 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option pid_file = [N ULL] 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option log_timestamp _format = %Y-%m-%d %H:%M:%S %Z 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option archive_dir = [NULL] 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option sql_on_connec tion = [NULL] 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option lag_interval = [NULL] 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option command_on_lo garchive = [NULL] 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option cleanup_inter val = 10 minutes 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: local node id = 2 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO main: main process started 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: launching sched_start_mainl oop 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: loading current cluster con figuration 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG storeNode: no_id=1 no_comment='Ma ster Node' 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG storePath: pa_server=1 pa_client= 2 pa_conninfo="dbname=Securithor2 host=192.168.1.50 user=slonyuser password = se curiTHOR971 port = 6234" pa_connretry=10 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG storeListen: li_origin=1 li_recei ver=2 li_provider=1 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG storeSet: set_id=1 set_origin=1 s et_comment='All tables and sequences' 2016-01-28 17:41:00 AmÚr. du Sud occid. WARN remoteWorker_wakeup: node 1 - no worker thread 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG storeSubscribe: sub_set=1 sub_pro vider=1 sub_forward='f' 2016-01-28 17:41:00 AmÚr. du Sud occid. WARN remoteWorker_wakeup: node 1 - no worker thread 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG enableSubscription: sub_set=1 2016-01-28 17:41:00 AmÚr. du Sud occid. WARN remoteWorker_wakeup: node 1 - no worker thread 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: last local event sequence = 5000462590 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: configuration complete - st arting threads 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO localListenThread: thread starts 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname = Securithor2 user = slonyuser password = securiTHOR971 port = 6234" is 90310 NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=5188 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG enableNode: no_id=1 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO remoteWorkerThread_1: thread star ts 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO remoteListenThread_1: thread star ts 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO main: running scheduler mainloop 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG cleanupThread: thread starts 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO syncThread: thread starts 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO monitorThread: thread starts 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname = Securithor2 user = slonyuser password = securiTHOR971 port = 6234" is 90310 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG remoteWorkerThread_1: update prov ider configuration 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG remoteWorkerThread_1: added activ e set 1 to provider 1 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname=Securithor2 h ost=192.168.1.50 user=slonyuser password = securiTHOR971 port = 6234" is 90306 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname = Securithor2 user = slonyuser password = securiTHOR971 port = 6234" is 90310 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG cleanupThread: bias = 60 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname = Securithor2 user = slonyuser password = securiTHOR971 port = 6234" is 90310 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname = Securithor2 user = slonyuser password = securiTHOR971 port = 6234" is 90310 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname=Securithor2 h ost=192.168.1.50 user=slonyuser password = securiTHOR971 port = 6234" is 90306 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO remoteWorkerThread_1: syncing set 1 with 59 table(s) from provider 1 It gets stuck at "syncing set 1 with 59 table(s) from provider 1" (the last line) forever with the occasional messages that says something about cleaning(threadcleaning I thing). Checking the postgres logs, I see lots of: 2016-01-28 17:33:07 AST LOG: n'a pas pu recevoir les données du client : unrecognized winsock error 10061 Which translates to: 2016-01-28 17:33:07 AST LOG: was not able to receive the data from the client : unrecognized winsock error 10061 I'm able to connect to the main db from the replicated machine no problem. I have no idea how this error 10061 is caused. Any ideas? Appreciate the help.
_______________________________________________ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general