It replicated for 2 months with the firewall and anti-virus on. Just in case, I turned the firewall and anti virus off and it's still not replicating. I used wiresharks to examine the packets and did not see anything suspicious. Using pgadmin, I'm able to connect to the main server from the replicated server and vice versa so the connection seems to be accepted.
When connection cannot be established or is rejected, slon log usually gives an error. In my case, it's just stuck on: INFO remoteWorkerThread_1: syncing set 1 with 59 table(s) from provider 1" with no errors. Actually, it gets stuck for several minutes and then do some cleanup operations. The following is the last few lines of the slon log. Sorry, the person's windows is in French hence you see a mixture of English and French: 2016-01-31 19:44:24 AmÚr. du Sud occid. INFO remoteWorkerThread_1: syncing set 1 with 59 table(s) from provider 1 NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=5388 CONTEXT: instruction SQL ┬½ SELECT "_slony_Securithor2".cleanupNodelock() ┬╗ fonction PL/pgsql "_slony_Securithor2".cleanupevent(interval), ligne 82 ├á PERFO RM NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=1176 CONTEXT: instruction SQL ┬½ SELECT "_slony_Securithor2".cleanupNodelock() ┬╗ fonction PL/pgsql "_slony_Securithor2".cleanupevent(interval), ligne 82 ├á PERFO RM NOTICE: Slony-I: log switch to sl_log_1 complete - truncate sl_log_2 CONTEXT: fonction PL/pgsql "_slony_Securithor2".cleanupevent(interval), ligne 9 5 ├á affectation 2016-01-31 19:54:24 AmÚr. du Sud occid. INFO cleanupThread: 0.062 seconds f or cleanupEvent() NOTICE: Slony-I: Logswitch to sl_log_2 initiated CONTEXT: instruction SQL ┬½ SELECT "_slony_Securithor2".logswitch_start() ┬╗ fonction PL/pgsql "_slony_Securithor2".cleanupevent(interval), ligne 97 ├á PERFO RM 2016-01-31 20:04:25 AmÚr. du Sud occid. INFO cleanupThread: 0.000 seconds f or cleanupEvent() What would cause no replication yet no error in the logs? Thanks. On Fri, Jan 29, 2016 at 9:50 AM, Jan Wieck <j...@wi3ck.info> wrote: > On 01/28/2016 10:57 PM, Sung Hsin Lei wrote: > >> Hello guys, >> >> So I have this setup that has already stopped on me 3 times the last 6 >> months. Each time it would replicate properly for 2-3 months and then it >> would just stop. It currently is stopped since January 11, 2016. The >> only way I can get replication back is to set everything up from >> scratch. I'm wondering if anyone has an idea on the issue causing the >> stoppage. I'm running 64-bit slony 2.2.4. >> >> Currently, when I run slon on the replicated machine, I get the following: >> >> >> >> C:\Program Files\PostgreSQL\9.3\bin>slon slony_Securithor2 "dbname = >> Securithor2 >> user = slonyuser password = securiTHOR971 port = 6234" >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: slon version 2.2.4 >> starting >> up >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option >> vac_frequenc >> y = 3 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option >> log_level = >> 0 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option >> sync_interva >> l = 2000 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option >> sync_interva >> l_timeout = 10000 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option >> sync_group_m >> axsize = 20 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option >> quit_sync_pr >> ovider = 0 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option >> remote_liste >> n_timeout = 300 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option >> monitor_inte >> rval = 500 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option >> explain_inte >> rval = 0 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option >> tcp_keepaliv >> e_idle = 0 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option >> tcp_keepaliv >> e_interval = 0 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option >> tcp_keepaliv >> e_count = 0 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option >> apply_cache_ >> size = 100 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Boolean option >> log_pid = 0 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Boolean option >> log_timestam >> p = 1 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Boolean option >> tcp_keepaliv >> e = 1 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Boolean option >> monitor_thre >> ads = 1 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Real option >> real_placeholde >> r = 0.000000 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option >> cluster_name >> = slony_Securithor2 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option >> conn_info = d >> bname = Securithor2 user = slonyuser password = securiTHOR971 port = 6234 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option >> pid_file = [N >> ULL] >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option >> log_timestamp >> _format = %Y-%m-%d %H:%M:%S %Z >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option >> archive_dir = >> [NULL] >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option >> sql_on_connec >> tion = [NULL] >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option >> lag_interval >> = [NULL] >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option >> command_on_lo >> garchive = [NULL] >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option >> cleanup_inter >> val = 10 minutes >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: local node id = 2 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO main: main process started >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: launching >> sched_start_mainl >> oop >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: loading current >> cluster con >> figuration >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG storeNode: no_id=1 >> no_comment='Ma >> ster Node' >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG storePath: pa_server=1 >> pa_client= >> 2 pa_conninfo="dbname=Securithor2 host=192.168.1.50 user=slonyuser >> password = se >> curiTHOR971 port = 6234" pa_connretry=10 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG storeListen: li_origin=1 >> li_recei >> ver=2 li_provider=1 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG storeSet: set_id=1 >> set_origin=1 s >> et_comment='All tables and sequences' >> 2016-01-28 17:41:00 AmÚr. du Sud occid. WARN remoteWorker_wakeup: node >> 1 - no >> worker thread >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG storeSubscribe: sub_set=1 >> sub_pro >> vider=1 sub_forward='f' >> 2016-01-28 17:41:00 AmÚr. du Sud occid. WARN remoteWorker_wakeup: node >> 1 - no >> worker thread >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG enableSubscription: >> sub_set=1 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. WARN remoteWorker_wakeup: node >> 1 - no >> worker thread >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: last local event >> sequence = >> 5000462590 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: configuration >> complete - st >> arting threads >> 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO localListenThread: thread >> starts >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname = >> Securithor2 >> user = slonyuser password = securiTHOR971 port = 6234" is 90310 >> NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=5188 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG enableNode: no_id=1 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO remoteWorkerThread_1: >> thread star >> ts >> 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO remoteListenThread_1: >> thread star >> ts >> 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO main: running scheduler >> mainloop >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG cleanupThread: thread >> starts >> 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO syncThread: thread starts >> 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO monitorThread: thread >> starts >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname = >> Securithor2 >> user = slonyuser password = securiTHOR971 port = 6234" is 90310 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG remoteWorkerThread_1: >> update prov >> ider configuration >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG remoteWorkerThread_1: >> added activ >> e set 1 to provider 1 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for >> "dbname=Securithor2 h >> ost=192.168.1.50 user=slonyuser password = securiTHOR971 port = 6234" >> is 90306 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname = >> Securithor2 >> user = slonyuser password = securiTHOR971 port = 6234" is 90310 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG cleanupThread: bias = 60 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname = >> Securithor2 >> user = slonyuser password = securiTHOR971 port = 6234" is 90310 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname = >> Securithor2 >> user = slonyuser password = securiTHOR971 port = 6234" is 90310 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for >> "dbname=Securithor2 h >> ost=192.168.1.50 user=slonyuser password = securiTHOR971 port = 6234" >> is 90306 >> 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO remoteWorkerThread_1: >> syncing set >> 1 with 59 table(s) from provider 1 >> >> >> >> >> It gets stuck at "syncing set 1 with 59 table(s) from provider 1" (the >> last line) forever with the occasional messages that says something >> about cleaning(threadcleaning I thing). >> >> >> Checking the postgres logs, I see lots of: >> >> 2016-01-28 17:33:07 AST LOG: n'a pas pu recevoir les données du client >> : unrecognized winsock error 10061 >> >> Which translates to: >> >> 2016-01-28 17:33:07 AST LOG: was not able to receive the data from the >> client : unrecognized winsock error 10061 >> >> I'm able to connect to the main db from the replicated machine no >> problem. I have no idea how this error 10061 is caused. >> > > Winsock error 10061 is WSAECONNREFUSED > > Connection refused. > > No connection could be made because the target computer actively > refused it. This usually results from trying to connect to a > service that is inactive on the foreign host—that is, one with no > server application running. > > This might be a firewall issue. Can you use some network sniffer to find > out what is happening on the TCP/IP level between the two machines? > > > Regards, Jan > > -- > Jan Wieck > Senior Software Engineer > http://slony.info >
_______________________________________________ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general