On 6/22/07, Shaun Thomas <[EMAIL PROTECTED]> wrote:
Howdy folks,

We're in the middle of a migration / upgrade, and I've got a giant slony
set in place, and I get no errors on anything, and syncing starts up
just great.  But something seems to be weird here:

2007-06-21 19:08:44 CDT FATAL  cleanupThread: "delete
from "_replication".sl_log_1 where log_origin = '10' and log_xid
< '757377'; delete from "_replication".sl_log_2 where log_origin = '10'
and log_xid < '757377'; delete from "_replication".sl_seqlog where
seql_origin = '10' and seql_ev_seqno < '2';
select "_replication".logswitch_finish(); " - server closed the
connection unexpectedly

After it copies a huge amount, say 15-17GB of our 40-45GB total, the
pace slows from about 300MB per minute to 5MB / minute, then to almost
nothing.  The remote system we're mirroring to has an idle disconnect
which is likely killing the connection in question, causing a giant
rollback of current progress.  The FATAL error above, tells me it's
doing a log switch on Node 10, which makes no sense, since Node 10 is a
slave, and should have no events.  This is also the same error I get,
every single time, even though the log_xid number itself may change.

So my questions:

1. Why is log switching on node 10, instead of node 1, which is
providing the data?

2. Why is this mysterious log switch stalling the data copy, so our idle
timer slaughters the initial table COPY commands mid-progress.

Why do you have an "idle timer" running during a subscribe? Killing
slons during subscribe is an outright bad idea.

3. Is there some way the initial copy can *not* be an "all or nothing"
proposition?  45GB seems an awfully huge first-bite, and it seems
unfair that not a single error or disconnect may occur during the
entire process of copying that much data.  Checkpoints?  Something?
Maybe a configuration for a heartbeat, anything I missed?

Slony doesn't replicate databases. It replicates sets of tables and
sequences. To smooth your initial subscribe, have you considered
breaking it into a number of small sets (say with a single table and
related sequences in each set) and subscribing them piece by piece?

Otherwise, the answer is no.

Also, you will want to VACUUM (if not TRUNCATE) those tables to get
those dead rows out before restarting your slon to try again.

4. Is it possible to somehow... bootstrap the mirror?  Make an exact
data copy of the current database and have slony only copy updates
after a certain point?  I mean, I could probably do a dump/restore and
let slony keep everything up to date, before our systems launch the
nightly insert jobs.

No. Unless you intend to use log shipped replication.

5. Something else I didn't consider?

Thanks in advance.  This is driving me nuts and I've scanned through
various documentation without much luck.  We're working with our vendor
to temporarily disable to idle kickoff, but there's a chance that may
not be the issue, considering that weird error I pasted always having
the same contents; I'd think the error would be different if it were
just an idle disconnect.

Mixing vendors and replication is like putting de lime in de coconut.
Good luck with that.

Andrew
_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general

Reply via email to