On 01/28/2016 10:57 PM, Sung Hsin Lei wrote:
> Hello guys,
>
> So I have this setup that has already stopped on me 3 times the last 6
> months. Each time it would replicate properly for 2-3 months and then it
> would just stop. It currently is stopped since January 11, 2016. The
> only way I can get replication back is to set everything up from
> scratch. I'm wondering if anyone has an idea on the issue causing the
> stoppage. I'm running 64-bit slony 2.2.4.
>
> Currently, when I run slon on the replicated machine, I get the following:
>
>
>
> C:\Program Files\PostgreSQL\9.3\bin>slon slony_Securithor2 "dbname =
> Securithor2
>    user = slonyuser password = securiTHOR971 port = 6234"
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: slon version 2.2.4
> starting
>   up
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
> vac_frequenc
> y = 3
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
> log_level =
> 0
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
> sync_interva
> l = 2000
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
> sync_interva
> l_timeout = 10000
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
> sync_group_m
> axsize = 20
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
> quit_sync_pr
> ovider = 0
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
> remote_liste
> n_timeout = 300
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
> monitor_inte
> rval = 500
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
> explain_inte
> rval = 0
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
> tcp_keepaliv
> e_idle = 0
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
> tcp_keepaliv
> e_interval = 0
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
> tcp_keepaliv
> e_count = 0
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
> apply_cache_
> size = 100
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Boolean option
> log_pid = 0
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Boolean option
> log_timestam
> p = 1
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Boolean option
> tcp_keepaliv
> e = 1
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Boolean option
> monitor_thre
> ads = 1
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Real option
> real_placeholde
> r = 0.000000
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option
> cluster_name
> = slony_Securithor2
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option
> conn_info = d
> bname = Securithor2  user = slonyuser password = securiTHOR971 port = 6234
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option
> pid_file = [N
> ULL]
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option
> log_timestamp
> _format = %Y-%m-%d %H:%M:%S %Z
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option
> archive_dir =
>   [NULL]
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option
> sql_on_connec
> tion = [NULL]
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option
> lag_interval
> = [NULL]
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option
> command_on_lo
> garchive = [NULL]
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option
> cleanup_inter
> val = 10 minutes
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: local node id = 2
> 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO   main: main process started
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: launching
> sched_start_mainl
> oop
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: loading current
> cluster con
> figuration
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG storeNode: no_id=1
> no_comment='Ma
> ster Node'
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG storePath: pa_server=1
> pa_client=
> 2 pa_conninfo="dbname=Securithor2 host=192.168.1.50 user=slonyuser
> password = se
> curiTHOR971  port = 6234" pa_connretry=10
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG storeListen: li_origin=1
> li_recei
> ver=2 li_provider=1
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG storeSet: set_id=1
> set_origin=1 s
> et_comment='All tables and sequences'
> 2016-01-28 17:41:00 AmÚr. du Sud occid. WARN   remoteWorker_wakeup: node
> 1 - no
> worker thread
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG storeSubscribe: sub_set=1
> sub_pro
> vider=1 sub_forward='f'
> 2016-01-28 17:41:00 AmÚr. du Sud occid. WARN   remoteWorker_wakeup: node
> 1 - no
> worker thread
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG enableSubscription: sub_set=1
> 2016-01-28 17:41:00 AmÚr. du Sud occid. WARN   remoteWorker_wakeup: node
> 1 - no
> worker thread
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: last local event
> sequence =
>   5000462590
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: configuration
> complete - st
> arting threads
> 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO   localListenThread: thread
> starts
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname =
> Securithor2
>    user = slonyuser password = securiTHOR971 port = 6234" is 90310
> NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=5188
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG enableNode: no_id=1
> 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO   remoteWorkerThread_1:
> thread star
> ts
> 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO   remoteListenThread_1:
> thread star
> ts
> 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO   main: running scheduler
> mainloop
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG cleanupThread: thread starts
> 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO   syncThread: thread starts
> 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO   monitorThread: thread starts
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname =
> Securithor2
>    user = slonyuser password = securiTHOR971 port = 6234" is 90310
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG remoteWorkerThread_1:
> update prov
> ider configuration
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG remoteWorkerThread_1:
> added activ
> e set 1 to provider 1
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for
> "dbname=Securithor2 h
> ost=192.168.1.50 user=slonyuser password = securiTHOR971  port = 6234"
> is 90306
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname =
> Securithor2
>    user = slonyuser password = securiTHOR971 port = 6234" is 90310
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG cleanupThread: bias = 60
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname =
> Securithor2
>    user = slonyuser password = securiTHOR971 port = 6234" is 90310
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname =
> Securithor2
>    user = slonyuser password = securiTHOR971 port = 6234" is 90310
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for
> "dbname=Securithor2 h
> ost=192.168.1.50 user=slonyuser password = securiTHOR971  port = 6234"
> is 90306
> 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO   remoteWorkerThread_1:
> syncing set
>   1 with 59 table(s) from provider 1
>
>
>
>
> It gets stuck at "syncing set 1 with 59 table(s) from provider 1" (the
> last line) forever with the occasional messages that says something
> about cleaning(threadcleaning I thing).
>
>
> Checking the postgres logs, I see lots of:
>
> 2016-01-28 17:33:07 AST LOG:  n'a pas pu recevoir les données du client
> : unrecognized winsock error 10061
>
> Which translates to:
>
> 2016-01-28 17:33:07 AST LOG:  was not able to receive the data from the
> client : unrecognized winsock error 10061
>
> I'm able to connect to the main db from the replicated machine no
> problem. I have no idea how this error 10061 is caused.

Winsock error 10061 is WSAECONNREFUSED

     Connection refused.

     No connection could be made because the target computer actively
     refused it. This usually results from trying to connect to a
     service that is inactive on the foreign host—that is, one with no
     server application running.

This might be a firewall issue. Can you use some network sniffer to find 
out what is happening on the TCP/IP level between the two machines?


Regards, Jan

-- 
Jan Wieck
Senior Software Engineer
http://slony.info
_______________________________________________
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general

Reply via email to