Jeff Frost wrote:
> Karl Denninger wrote:
>>>   
>> But they should have switched the master to Node #4 when the move set
>> command was executed.  When they reconnect they should be doing so to
>> Node #2, not Node #2 - IF they saw the "move set" command (and it
>> appears they did.)
>>
>> Further, I ran the change in the paths on that node - that is,
>> locally to that machine.  No difference.
> When you indicate that you ran the store path on that node, can you be
> specific about what you did?
>
>
>>>> I'm wondering what happened here.  It is almost as if the "move set"
>>>> never executed on the other subscribers - an impossibility, no?  They
>>>> WERE all replicating and current just before the shutdown - I checked
>>>> them all.  How does that happen under these circumstances?
>>>>
>>>> Is there a better way for the future?  I'm back up now, but the entire
>>>> point of this exercise was to AVOID having to copy the entire database
>>>> over - while I avoided any material downtime for my users, I was left
>>>> EXPOSED to a failure for the copy period, which was kinda nasty.
>>>>
>>>> Thoughts appreciated.
>>>>
>>>>     
>>>
>>> Probably the way to avoid it would have been to issue the store path
>>> changes before switching the ports.  But, if you forget to do it in the
>>> future, you can fix it afterwards by going bare metal and updating the
>>> paths in the _tickerform.sl_path table on the nodes that don't have the
>>> correct information.
>>>
>>>   
>> I still don't understand why the node change wasn't picked up by
>> these slaves when the move set executed; I would have expected that
>> this would be the case (that is, it would be expecting Node #4 to be
>> the master) and although it showed up on the "wrong" ip address a
>> store path should have fixed that.
>>
>> It APPEARS that it was looking for the old master on Node #2....
>> implying (I think) that it never saw the move set.
>>
>> Or am I misunderstanding how the internals work here?
>>
>
> I don't think the problem is that it didn't see the move set, I think
> the problem is that it didn't get the store path commands because it
> didn't connect to the 'new' master after you changed the ports out
> from under it.  I don't think slony is well designed for having the
> paths changed out from under it and you'll likely have to fix them by
> hand when you do this.
>
> I'm pretty sure what happened (and hopefully someone will correct me
> if I'm wrong) is even though you ran the slonik store path command on
> the broken node, slonik connected to the new master, updated the
> master's DB with the store path info and put this event in the log to
> propagate out to the slaves.  Unfortunately, because the broken slave
> still had the old path in the sl_path table, it didn't know how to
> connect to the new master and therefore never received the new path
> information. 
>
But the log says it DID receive the new path information - when I
executed the "store paths" on the client the log file for slon on that
client immediately reflected that the path configuration had been
changed.  So clearly, it saw it on the local host.

I have since dropped the old database (which was running as a "safety"
overnight using "drop node" and of note that DID drop the schema as the
replication was torn down......

Its pretty clear to me that something went wrong during the move set -
but exactly what and why I can't reproduce at the present time.

I'll have to see if I can set up a "sandbox" and try this in an isolated
environment to see if I can figure out why it happened and hopefully
prevent myself from getting bit like this again.


begin:vcard
fn:Karl Denninger
n:Denninger;Karl
org:Cuda Systems LLC
adr;dom:;;314 Olde Post Road;Niceville;FL;32578
email;internet:[email protected]
tel;work:850-376-9364
tel;fax:850-897-9364
x-mozilla-html:TRUE
url:http://market-ticker.org
version:2.1
end:vcard

_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general

Reply via email to