Karl Lehenbauer wrote:
> Well, I did not know that the message was normal while replication was 
> starting.  When I was having the problem, like 10 hours after replicating, 
> one of the sl_log tables had millions of rows and a slony postgres process 
> was continuously at 80% cpu.  I acknowledge it may have been an overreaction 
> to tear the cluster down but after losing the cluster due to a crash, I don't 
> know, it's not very hard to drop and if we start having problems it kills the 
> site pretty fast.
> 
> Anyway after regenerating the cluster with a different schema name, not that 
> that had anything to do with it, and waiting, slony has caught up and is now 
> properly truncating / flipping the sl_log_* tables.
> 
> Also while this was going on we tried a switchover, which gave us a config 
> error for node -1 (!), and we tried a failover but it just hung.  Now that 
> things seem to be working better, maybe the switchover will go better during 
> tonight's maintenance window.
> 

Trying a switchover while a subscription is in progress is probably not 
the greatest test (unless your goal is to go out and search for 
problems).  A MOVE SET shouldn't complete until after the subscription 
process is finished (so it will appear as hung).   I'm  not sure what a 
FAILOVER while the subscription process is in progress will do, I'll try 
to write up a test case to see what it actually does.

In some of the testing I seem to hit the node -1 issue sometimes with 
cascaded replicas as well.  I've opened some bugs to track it but have 
yet figure out exactly what is going on.



> 
> On Jun 3, 2010, at 10:04 AM, Jan Wieck wrote:
> 
>> On 6/3/2010 9:20 AM, Karl Lehenbauer wrote:
>>> It does finish, and least I think it does.  I will look very carefully this 
>>> time to make sure (takes a few hours).
>>> This morning I tore the cluster down and recreated it with a different 
>>> schema name, thinking that might work.  I saw the sl_log_1-not-truncated 
>>> messages and figured it was hosed, so seeing your message that the 
>>> logswitch can't happen while there is data that needs to be replicated was 
>>> heartening.  (It might be good to make a note in the FAQ that those 
>>> messages are normal when doing the initial subscribe.)
>> I am not sure where that idea, that tearing down the whole cluster, is a 
>> good response to this issue.
>>
>> The reason, why Slony-I cannot finish the log switch, is because of either
>>
>> 1) there are still log rows in the segment that need to be replicated.
>>
>> or
>>
>> 2) transactions that were in progress when the logswitch started are
>>   still in progress.
>>
>> In case 2) it is possible that such transaction actually did create new log 
>> rows, which once the transaction commits of course will need to be 
>> replicated.
>>
>>
>> Jan
>>
>> -- 
>> Anyone who trades liberty for security deserves neither
>> liberty nor security. -- Benjamin Franklin
> 


-- 
Steve Singer
Afilias Canada
Data Services Developer
416-673-1142
_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general

Reply via email to