Karl Lehenbauer wrote: > Well, I did not know that the message was normal while replication was > starting. When I was having the problem, like 10 hours after replicating, > one of the sl_log tables had millions of rows and a slony postgres process > was continuously at 80% cpu. I acknowledge it may have been an overreaction > to tear the cluster down but after losing the cluster due to a crash, I don't > know, it's not very hard to drop and if we start having problems it kills the > site pretty fast. > > Anyway after regenerating the cluster with a different schema name, not that > that had anything to do with it, and waiting, slony has caught up and is now > properly truncating / flipping the sl_log_* tables. > > Also while this was going on we tried a switchover, which gave us a config > error for node -1 (!), and we tried a failover but it just hung. Now that > things seem to be working better, maybe the switchover will go better during > tonight's maintenance window. >
Trying a switchover while a subscription is in progress is probably not the greatest test (unless your goal is to go out and search for problems). A MOVE SET shouldn't complete until after the subscription process is finished (so it will appear as hung). I'm not sure what a FAILOVER while the subscription process is in progress will do, I'll try to write up a test case to see what it actually does. In some of the testing I seem to hit the node -1 issue sometimes with cascaded replicas as well. I've opened some bugs to track it but have yet figure out exactly what is going on. > > On Jun 3, 2010, at 10:04 AM, Jan Wieck wrote: > >> On 6/3/2010 9:20 AM, Karl Lehenbauer wrote: >>> It does finish, and least I think it does. I will look very carefully this >>> time to make sure (takes a few hours). >>> This morning I tore the cluster down and recreated it with a different >>> schema name, thinking that might work. I saw the sl_log_1-not-truncated >>> messages and figured it was hosed, so seeing your message that the >>> logswitch can't happen while there is data that needs to be replicated was >>> heartening. (It might be good to make a note in the FAQ that those >>> messages are normal when doing the initial subscribe.) >> I am not sure where that idea, that tearing down the whole cluster, is a >> good response to this issue. >> >> The reason, why Slony-I cannot finish the log switch, is because of either >> >> 1) there are still log rows in the segment that need to be replicated. >> >> or >> >> 2) transactions that were in progress when the logswitch started are >> still in progress. >> >> In case 2) it is possible that such transaction actually did create new log >> rows, which once the transaction commits of course will need to be >> replicated. >> >> >> Jan >> >> -- >> Anyone who trades liberty for security deserves neither >> liberty nor security. -- Benjamin Franklin > -- Steve Singer Afilias Canada Data Services Developer 416-673-1142 _______________________________________________ Slony1-general mailing list [email protected] http://lists.slony.info/mailman/listinfo/slony1-general
