Hi! Nice to hear. What could be "interesting" is how stable the WAN-type of corosync communication works. If it's not that stable, the cluster could try to fence nodes rather frequently. OK, you disabled fencing; maybe it works without. Did you tune the parameters?
Regards, Ulrich >>> Antony Stone <antony.st...@ha.open.source.it> schrieb am 05.08.2021 um 14:44 in Nachricht <202108051444.39919.antony.st...@ha.open.source.it>: > On Thursday 05 August 2021 at 10:51:37, Antony Stone wrote: > >> On Thursday 05 August 2021 at 07:48:37, Ulrich Windl wrote: >> > >> > Have you ever tried to find out why this happens? (Talking about logs) >> >> Not in detail, no, but just in case there's a chance of getting this >> working as suggested simply using location constraints, I shall look >> further. > > I now have a working solution ‑ thank you to everyone who has helped. > > The answer to the problem above was simple ‑ with a 6‑node cluster, 3 votes is > > not quorum. > > I added a 7th node (in "city C") and adjusted the location constraints to > ensure that cluster A resources run in city A, cluster B resources run in > city > B, and the "anywhere" resource runs in either city A or city B. > > I've even added a colocation constraint to ensure that the "anywhere" > resource > runs on the same machine in either city A or city B as is running the local > resources there (which wasn't a strict requirement, but is very useful). > > For anyone interested in the detail of how to do this (without needing > booth), > here is my cluster.conf file, as in "crm configure load replace > cluster.conf": > > ‑‑‑‑‑‑‑‑ > node tom attribute site=cityA > node dick attribute site=cityA > node harry attribute site=cityA > > node fred attribute site=cityB > node george attribute site=cityB > node ron attribute site=cityB > > primitive A‑float IPaddr2 params ip=192.168.32.250 cidr_netmask=24 meta > migration‑threshold=3 failure‑timeout=60 op monitor interval=5 timeout=20 on‑ > fail=restart > primitive B‑float IPaddr2 params ip=192.168.42.250 cidr_netmask=24 meta > migration‑threshold=3 failure‑timeout=60 op monitor interval=5 timeout=20 on‑ > fail=restart > primitive Asterisk asterisk meta migration‑threshold=3 failure‑timeout=60 op > monitor interval=5 timeout=20 on‑fail=restart > > group GroupA A‑float4 resource‑stickiness=100 > group GroupB B‑float4 resource‑stickiness=100 > group Anywhere Asterisk resource‑stickiness=100 > > location pref_A GroupA rule ‑inf: site ne cityA > location pref_B GroupB rule ‑inf: site ne cityB > location no_pref Anywhere rule ‑inf: site ne cityA and site ne cityB > > colocation Ast 100: Anywhere [ cityA cityB ] > > property cib‑bootstrap‑options: stonith‑enabled=no no‑quorum‑policy=stop > start‑failure‑is‑fatal=false cluster‑recheck‑interval=60s > ‑‑‑‑‑‑‑‑ > > Of course, the group definitions are not needed for single resources, but I > shall in practice be using multiple resources which do need groups, so I > wanted to ensure I was creating something which would work with that. > > I have tested it by: > > ‑ bringing up one node at a time: as soon as any 4 nodes are running, all > possible resources are running > > ‑ bringing up 5 or more nodes: all resources run > > ‑ taking down one node at a time to a maximum of three nodes offline: if at > least one node in a given city is running, the resources at that city are > running > > ‑ turning off (using "halt", so that corosync dies nicely) all three nodes > in > a city simultaneously: that city's resources stop running, the other city > continues working, as well as the "anywhere" resource > > ‑ causing a network failure at one city (so it simply disappears without > stopping corosync neatly): the other city continues its resources (plus the > "anywhere" resource), the isolated city stops > > For me, this is the solution I wanted, and in fact it's even slightly better > > than the previous two isolated 3‑node clusters I had, because I can now have > resources running on a single active node in cityA (provided it can see at > least 3 other nodes in cityB or cityC), which wasn't possible before. > > > Once again, thanks to everyone who has helped me to achieve this result :) > > > Antony. > > ‑‑ > "The future is already here. It's just not evenly distributed yet." > > ‑ William Gibson > > Please reply to the list; > please *don't* CC > me. > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/