Re: [Slony1-general] Huge database remote sync issue. Ideas?

Jan Wieck Mon, 25 Jun 2007 07:30:48 -0700

On 6/22/2007 6:07 PM, Shaun Thomas wrote:

On Friday 22 June 2007 04:46:56 pm Christopher Browne wrote:
That'll take place routinely on all nodes; slony
needs to switch between sl_log_1 and sl_log_2 periodically,
and does so on every node.
I was wondering about that. I was mostly suspicious because it copieslike mad until about 15-17G, and then slows to a crawl before eruptingwith those errors and aborting the sync. The logswitch was just whereit kept dying, so I kept it as a possible candidate.

Any error on any DB connection will usually cause slon to restarteverything internally. Do you by any chance run the slon serving thenode behind the firewall from the outside? In that case, you will havetrouble because the cleanup thread is idle most of its life and when ittries to do its work, your firewall has pulled the plugs. So that's notgoing to work ever. You will have to run the slon on the remote site.

The slowdown you experience might be related to how slony does theinitial copy. Per table it disables index maintenance, copies the data,then enables indexes and issues a reindex for that table. So if at thattime it is copying medium size tables with many indexes, this would bewhat I'd expect to happen.

There's something odd about the problem with logswitch_finish();
can you check your logs to see if the DBMS saw a Signal 11 or
such at 19:08:44?
I didn't see any Sig-11, but postgres *did* whine about the clientunexpectedly closing the connection. But considering the slony clientwas just as confused about the connection drop, that's what made methink of the Savvis firewall getting pissy.

Slon itself never does a proper PQfinish() call to close theconnections. So whenever slon is stopped or internally restarted, youwill see those messages about clients disconnecting. However, thefirewall is still the source of your troubles. It is just plain wrong tokill a perfectly healthy TCP connection just because it wasn't used fora while. You might want to try using the tcp_keepalives_* config optionsin the postgresql.conf file of the poor server behind that stupid flamewall.

If you have multiple tables, you could set up a replication set per
table, and subscribe one table at a time.  In practice, you probably
have five tables that are bigger than all the others put together;
That's exactly the case. Not counting indexes, I have 1 5GB table with44M rows, a 4GB table with 50M rows, and a 2.5G table with 10M rows.If I put those in their own sets, the rest would have no problem beingin the remainder.
But I wonder about something - does slony turn off indexes to facilitatethe data copy, and then reenable them so they're all created after thecopy is done? If that's the case, that could be my problem. 17G isabout the size of all the tables with no indexes, fully vacuumed. Ifslony is waiting around forever for the indexes to finish generatingbefore committing and declaring the initial copy successful, that couldaccount for my idle time.


It does that table by table.


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== [EMAIL PROTECTED] #
_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general

Re: [Slony1-general] Huge database remote sync issue. Ideas?

Reply via email to