Alexey, Thanks for your feedback. I acknowledge that, in theory, a situation may arise where a node is brought online and all the previously running nodes were not fully synchronised so it is then a problem for the newly started node to know which data set to pull. In addition to the example you give - lost interconnection - I can also foresee difficulties when several nodes all start at the same time. However, I do not see how arbitrarily setting one node as "seed" will help to resolve either of these situations unless the seed node has more (or better) information than the others.
I am trying to design a multi-node solution that is scalable. I want to be able to add and remove nodes according to current load. Also, to be able to take one node offline, do some maintenance, then bring it back online. For my scenario, the probability of any node being taken offline for maintenance during the year is 99.9% whereas I would say the probability of partial loss of LAN connectivity (causing the split-brain issue) is less than 0.01%. If possible, I would really like to see an option added to the usrloc module to override the "seed" node concept. Something that allows any node (including seed) to attempt to pull registration details from another node on startup. In my scenario, a newly started node with no usrloc data is a major problem - it could take it 40 minutes to get close to having a full set of registration data. I would prefer to take the risk of it pulling data from the wrong node rather than it not attempting to synchronise at all. Happy New Year to all. John Quick Smartvox Limited > Hi John, > > Next is just my opinion. And I didn't explore source code OpenSIPS for syncing data. > > The problem is little bit deeper. As we have cluster, we potentially have split-brain. > We can disable seed node at all and just let nodes work after disaster/restart. But it means that we can't guarantee consistency of data. So nodes must show this with <Not in sync> state. > > Usually clusters use quorum to trust on. But for OpenSIPS I think this approach is too expensive. And of course for quorum we need minimum 3 hosts. > For 2 hosts after loosing/restoring interconnection it is impossible to say, which host has consistent data. That's why OpenSIPS uses seed node as artificial trust point. I think <seed> node doesn't solve syncing problems, but it simplifies total work. > > Let's imagine 3 nodes A,B,C. A is Active. A and B lost interconnection. C is down. Then C is up and has 2 hosts for syncing. But A already has 200 phones re-registered for some reason. So we have 200 conflicts (on node B the same phones still in memory). Where to sync from? <Seed> host will answer this question in 2 cases (A or B). Of course if C is <seed> so it just will be happy from the start. And I actually don't know what happens, if we now run <ul_cluster_sync> on C. Will it get all the contacts from A and B or not? > >We operate with specific data, which is temporary. So syncing policy can be more relaxed. May be it's a good idea to connect somehow <seed> node with Active role in the cluster. But again, if Active node restarts and still Active - we will have a problem. > > ----- > Alexey Vasilyev _______________________________________________ Users mailing list Users@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/users