Hi All,
We're now seeing the following message:
ERROR remoteListenThread_1: timeout for event selection
At the same time as:
sched_mainloop: select(): Bad file descriptor
appears on standard error.
This appears to be a very similar problem to a post last year:
http://gborg.postgresql.org/pipermail/slony1-general/2004-October/000747.html
In a reply, Jan suggested that this locking problem would be fixed in 1.0.3.
We're running 1.1.0. Can an anybody comment as to whether this really
has been fixed? Or should we be looking at the patch suggested?
Matthew.
Oct 25 17:11:55 radius2 slon_radius2[33490]: [28-1] 2005-10-25 17:11:55
EST [33490] ERROR remoteListenThread_2: timeout for event selection
Oct 25 17:17:49 radius2 slon_radius2[33490]: [29-1] 2005-10-25 17:17:49
EST [33490] ERROR remoteListenThread_2: timeout for event selection
Oct 25 17:23:34 radius2 slon_radius2[33490]: [30-1] 2005-10-25 17:23:34
EST [33490] ERROR remoteListenThread_2: timeout for event selection
Obviously, at this stage, replication fails.
So far, our investigation has found that:
* It appears to fail only on clusters with more than one slave.
* It isn't periodic as far as we can tell (it can take a week or two to
fail).
* We haven't had it fail under load (it's currently failing when
completely idle -- no changes being submitted to the master).
All nodes are running slony1-1.1.0 on FreeBSD 4.11.
Any suggestions on what might be causing this, or where I should look
for more useful debugging information?
--
Matthew Horoschun
Internet Development
Telstra Internet Direct
Ph: +61 2 6208 1929
Fax: + 61 2 6248 6165
[EMAIL PROTECTED]
_______________________________________________
Slony1-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/slony1-general