LOL, somehow I clicked on an ancient message in my list folder ... well the advice stands if anyone has a similar issue ;)
I plead a migraine, they make me miss little details like dates ... On Thu, 2020-07-02 at 09:45 -0500, Ken Gaillot wrote: > On Thu, 2018-09-06 at 00:59 +0000, Jeffrey Westgate wrote: > > Greetings from a confused user; > > > > We are running pacemaker as part of a load-balanced cluster of two > > members, both VMWare VMs, with both acting as stepping-stones to > > our > > DNS recursive resolvers (RR). Simple use - the /etc/resolver.conf > > on the *NIX boxes points at both IPs, and the cluster forwards to > > one > > of multiple RRs for DNS resolution. > > I'm not sure about your specific issue, but generally it's a bad idea > to round-robin DNS servers due to TTL/caching issues. The client > should > know it's contacting the same server at the same IP each time, to > have > a correct idea of how long entries can be cached. > > My personal preferred HA approach for DNS is: > > * Put the DNS servers in containers or VMs that are the pacemaker > resources, each bound to a specific floating IP (even better, make > the > container a bundle, or the VM a guest node, to run the DNS server as > a > resource inside it for monitoring/restarting purposes) > > * List the floating IPs as multiple DNS servers on the client side > (whether static like resolver.conf or DHCP) (this is for resolvers, > you > could do the same for domain servers by listing them as multiple NS > records for the domains) > > > Today, for an as-yet undetermined reason, one of the two members > > started failing to connect to the RRs. Intermittently. And quite > > annoyingly, as this has affected data center operations. No matter > > what we've tried, one member fails intermittently, the other is > > fine. > > And we've tried - > > - reboot of the affected member - it came back up clean and fine, > > but the issue remained. > > - fail the cluster, moving both IPs to the second member server; > > failover was successful, problem remained. > > -- this moved the entire cluster to a different VM on a different > > VMWare host server, so different NIC, etc... > > - failed the cluster back to the original server; both IPs appears > > on > > the 'suspect' VM, and the problem remained > > - restore the cluster; both IPs are on the proper VMs, but the one > > still fails intermittently while the second just chugs along. > > Sounds networking related ... could something else on the network be > claiming that IP? Or something wrong with the switch? > > > Any ideas what could be causing this? Is this something that could > > be caused by the cluster config? Anybody ever seen anything > > similar? > > > > Our current unsustainable workaround is to remove the IP for the > > affected member from the *NIX resolver.conf file. > > > > I appreciate any reasonable suggestions. (I am not the creator of > > the cluster, just the guy trying o figure it out. Unfortunately the > > creator and my mentor is dearly departed and, in times like this, > > sorely missed.) > > My condolences ... > > > Any replies will be read and responded to early tomorrow > > AM. thanks > > for understanding. > > -- > > Jeff Westgate -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/