[Slony1-general] Replication inexplicably stops

Sung Hsin Lei Thu, 28 Jan 2016 20:02:05 -0800

Hello guys,

So I have this setup that has already stopped on me 3 times the last 6
months. Each time it would replicate properly for 2-3 months and then it
would just stop. It currently is stopped since January 11, 2016. The only
way I can get replication back is to set everything up from scratch. I'm
wondering if anyone has an idea on the issue causing the stoppage. I'm
running 64-bit slony 2.2.4.


Currently, when I run slon on the replicated machine, I get the following:



C:\Program Files\PostgreSQL\9.3\bin>slon slony_Securithor2 "dbname =
Securithor2
  user = slonyuser password = securiTHOR971 port = 6234"
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: slon version 2.2.4
starting
 up
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
vac_frequenc
y = 3
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
log_level =
0
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
sync_interva
l = 2000
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
sync_interva
l_timeout = 10000
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
sync_group_m
axsize = 20
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
quit_sync_pr
ovider = 0
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
remote_liste
n_timeout = 300
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
monitor_inte
rval = 500
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
explain_inte
rval = 0
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
tcp_keepaliv
e_idle = 0
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
tcp_keepaliv
e_interval = 0
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
tcp_keepaliv
e_count = 0
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Integer option
apply_cache_
size = 100
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Boolean option log_pid
= 0
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Boolean option
log_timestam
p = 1
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Boolean option
tcp_keepaliv
e = 1
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Boolean option
monitor_thre
ads = 1
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: Real option
real_placeholde
r = 0.000000
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option
cluster_name
= slony_Securithor2
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option
conn_info = d
bname = Securithor2  user = slonyuser password = securiTHOR971 port = 6234
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option pid_file
= [N
ULL]
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option
log_timestamp
_format = %Y-%m-%d %H:%M:%S %Z
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option
archive_dir =
 [NULL]
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option
sql_on_connec
tion = [NULL]
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option
lag_interval
= [NULL]
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option
command_on_lo
garchive = [NULL]
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: String option
cleanup_inter
val = 10 minutes
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: local node id = 2
2016-01-28 17:41:00 AmÚr. du Sud occid. INFO   main: main process started
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: launching
sched_start_mainl
oop
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: loading current
cluster con
figuration
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG storeNode: no_id=1
no_comment='Ma
ster Node'
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG storePath: pa_server=1
pa_client=
2 pa_conninfo="dbname=Securithor2 host=192.168.1.50 user=slonyuser password
= se
curiTHOR971  port = 6234" pa_connretry=10
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG storeListen: li_origin=1
li_recei
ver=2 li_provider=1
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG storeSet: set_id=1
set_origin=1 s
et_comment='All tables and sequences'
2016-01-28 17:41:00 AmÚr. du Sud occid. WARN   remoteWorker_wakeup: node 1
- no
worker thread
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG storeSubscribe: sub_set=1
sub_pro
vider=1 sub_forward='f'
2016-01-28 17:41:00 AmÚr. du Sud occid. WARN   remoteWorker_wakeup: node 1
- no
worker thread
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG enableSubscription: sub_set=1
2016-01-28 17:41:00 AmÚr. du Sud occid. WARN   remoteWorker_wakeup: node 1
- no
worker thread
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: last local event
sequence =
 5000462590
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: configuration complete
- st
arting threads
2016-01-28 17:41:00 AmÚr. du Sud occid. INFO   localListenThread: thread
starts
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname =
Securithor2
  user = slonyuser password = securiTHOR971 port = 6234" is 90310
NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=5188
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG enableNode: no_id=1
2016-01-28 17:41:00 AmÚr. du Sud occid. INFO   remoteWorkerThread_1: thread
star
ts
2016-01-28 17:41:00 AmÚr. du Sud occid. INFO   remoteListenThread_1: thread
star
ts
2016-01-28 17:41:00 AmÚr. du Sud occid. INFO   main: running scheduler
mainloop
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG cleanupThread: thread starts
2016-01-28 17:41:00 AmÚr. du Sud occid. INFO   syncThread: thread starts
2016-01-28 17:41:00 AmÚr. du Sud occid. INFO   monitorThread: thread starts
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname =
Securithor2
  user = slonyuser password = securiTHOR971 port = 6234" is 90310
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG remoteWorkerThread_1: update
prov
ider configuration
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG remoteWorkerThread_1: added
activ
e set 1 to provider 1
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for
"dbname=Securithor2 h
ost=192.168.1.50 user=slonyuser password = securiTHOR971  port = 6234" is
90306
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname =
Securithor2
  user = slonyuser password = securiTHOR971 port = 6234" is 90310
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG cleanupThread: bias = 60
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname =
Securithor2
  user = slonyuser password = securiTHOR971 port = 6234" is 90310
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname =
Securithor2
  user = slonyuser password = securiTHOR971 port = 6234" is 90310
2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for
"dbname=Securithor2 h
ost=192.168.1.50 user=slonyuser password = securiTHOR971  port = 6234" is
90306
2016-01-28 17:41:00 AmÚr. du Sud occid. INFO   remoteWorkerThread_1:
syncing set
 1 with 59 table(s) from provider 1




It gets stuck at "syncing set 1 with 59 table(s) from provider 1" (the last
line) forever with the occasional messages that says something about
cleaning(threadcleaning I thing).


Checking the postgres logs, I see lots of:

2016-01-28 17:33:07 AST LOG:  n'a pas pu recevoir les donnÃ©es du client :
unrecognized winsock error 10061

Which translates to:

2016-01-28 17:33:07 AST LOG:  was not able to receive the data from the
client : unrecognized winsock error 10061

I'm able to connect to the main db from the replicated machine no problem.
I have no idea how this error 10061 is caused.

Any ideas?

Appreciate the help.

_______________________________________________
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general

[Slony1-general] Replication inexplicably stops

Reply via email to