On 6-Feb-07, at 1:37 PM, Dan Falconer wrote: ... > I'm not trying to blame-shift here, but it really seems like the > lag is > generated from Slony itself. There's a stale connection, initiated > by Slony > (that's the only thing that connects on the private 192.168.1.x > network), > which sits idle in transaction. On the master, the query shows as > "fetch 100 > from LOG;" and shows on the slave as "<IDLE> in transaction". Just yesterday, our replication was behind by 18 million rows in sl_log_1 and 8 million rows in sl_log_2. This is on Slony-I 1.2.6 and PostgreSQL 8.2.1. A slony postgres process was taking up 100% processor (more specifically, one core of one processor) on the master db. Doing a netstat -a on the master db showed a non-slony postgres process in a CLOSE_WAIT state corresponding to a query in "<IDLE> in transaction". Once I got rid of the hung connection, slony caught up within 40 minutes. I suppose this might be similar to what occurs when there is a long running query.
I'm not sure if this is pertinent to your problem, but I thought I'd forward my experience in case it helps as the problems seem similar. I am going to add a watchdog to check for network connections stuck in a CLOSE_WAIT state to hopefully catch this in the future. Brian Wipf <[EMAIL PROTECTED]> _______________________________________________ Slony1-general mailing list [email protected] http://gborg.postgresql.org/mailman/listinfo/slony1-general
