On 6/22/2007 6:07 PM, Shaun Thomas wrote:
On Friday 22 June 2007 04:46:56 pm Christopher Browne wrote:
That'll take place routinely on all nodes; slony
needs to switch between sl_log_1 and sl_log_2 periodically,
and does so on every node.
I was wondering about that. I was mostly suspicious because it copies
like mad until about 15-17G, and then slows to a crawl before erupting
with those errors and aborting the sync. The logswitch was just where
it kept dying, so I kept it as a possible candidate.
Any error on any DB connection will usually cause slon to restart
everything internally. Do you by any chance run the slon serving the
node behind the firewall from the outside? In that case, you will have
trouble because the cleanup thread is idle most of its life and when it
tries to do its work, your firewall has pulled the plugs. So that's not
going to work ever. You will have to run the slon on the remote site.
The slowdown you experience might be related to how slony does the
initial copy. Per table it disables index maintenance, copies the data,
then enables indexes and issues a reindex for that table. So if at that
time it is copying medium size tables with many indexes, this would be
what I'd expect to happen.
There's something odd about the problem with logswitch_finish();
can you check your logs to see if the DBMS saw a Signal 11 or
such at 19:08:44?
I didn't see any Sig-11, but postgres *did* whine about the client
unexpectedly closing the connection. But considering the slony client
was just as confused about the connection drop, that's what made me
think of the Savvis firewall getting pissy.
Slon itself never does a proper PQfinish() call to close the
connections. So whenever slon is stopped or internally restarted, you
will see those messages about clients disconnecting. However, the
firewall is still the source of your troubles. It is just plain wrong to
kill a perfectly healthy TCP connection just because it wasn't used for
a while. You might want to try using the tcp_keepalives_* config options
in the postgresql.conf file of the poor server behind that stupid flamewall.
If you have multiple tables, you could set up a replication set per
table, and subscribe one table at a time. In practice, you probably
have five tables that are bigger than all the others put together;
That's exactly the case. Not counting indexes, I have 1 5GB table with
44M rows, a 4GB table with 50M rows, and a 2.5G table with 10M rows.
If I put those in their own sets, the rest would have no problem being
in the remainder.
But I wonder about something - does slony turn off indexes to facilitate
the data copy, and then reenable them so they're all created after the
copy is done? If that's the case, that could be my problem. 17G is
about the size of all the tables with no indexes, fully vacuumed. If
slony is waiting around forever for the indexes to finish generating
before committing and declaring the initial copy successful, that could
account for my idle time.
It does that table by table.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== [EMAIL PROTECTED] #
_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general