Hi Everyone, I've spent the last week setting up a small cluster consisting of four nodes and am experiencing intermittent problems with the apache/loadbalancing side of things. Sometimes it will just work perfectly and then five minutes later it's broken!
I'm learning as I go along here and have now decided it's time to swallow my pride and ask for help! The cluster consists of 2 Apache+MySQL nodes. (server01 and server02) and 2 UltraMonkey + DRBD nodes (data01 and data02) I have 3 virtual IPs, 1 for the DRBD NFS share and 2 for Apache and I have tried to set it up so that data01 is the default master for DRBD and data02 is the default master for Ldirectord, and they should return to their original servers after a failed server recovers. DRBD seems to work perfectly, the master successfully swaps between nodes when I individually stop heartbeat on each machine and both of the apache nodes can access the NFS share via the VIP no problem. Pointing a web browser at one of the VIP's meant for apache is a different matter! Usually it will all work fine for a short while when I first boot up all the machines, and it successfully round-robins accesses between the two nodes. But then it will suddenly stop working, even though an "ip addr sh eth0" still shows the correct VIPs bound to each machine. Occasionally I can get it temporarily working again by randomly stopping and starting heartbeat on data01 and data02, otherwise I have to reboot everything. I will include my configurations below and if anyone can offer any help or advice before I tear any more of my hair out it will be hugely appreciated! Many thanks, Simon Configs: ha.cf: logfacility local0 bcast eth0 node data01 data02 keepalive 2 deadtime 10 mcast eth0 225.0.0.1 694 1 0 respawn hacluster /usr/lib/heartbeat/ipfail apiauth ipfail gid=haclient uid=hacluster haresources: data01 \ IPaddr::192.168.0.120/24/eth0 \ drbddisk::r0 Filesystem::/dev/drbd0::/data::xfs \ nfs-kernel-server data02 \ ldirectord::ldirectord.cf \ LVSSyncDaemonSwap::master \ IPaddr2::192.168.0.100/24/eth0/192.168.0.255 \ IPaddr2::192.168.0.110/24/eth0/192.168.0.255 ldirectord.cf: checktimeout=10 checkinterval=2 autoreload=no logfile="local0" quiescent=yes virtual=192.168.0.100:80 real=192.168.0.101:80 gate real=192.168.0.102:80 gate fallback=127.0.0.1:80 gate service=http request="ldirector.html" receive="Test Page" scheduler=rr persistent=1800 protocol=tcp checktype=negotiate virtual=192.168.0.100:443 real=192.168.0.101:443 gate real=192.168.0.102:443 gate fallback=127.0.0.1:443 gate service=https request="ldirector.html" receive="Test Page" scheduler=rr persistent=1800 protocol=tcp checktype=negotiate virtual=192.168.0.110:80 real=192.168.0.101:80 gate real=192.168.0.102:80 gate fallback=127.0.0.1:80 gate service=http request="ldirector.html" receive="Test Page" scheduler=rr persistent=1800 protocol=tcp checktype=negotiate virtual=192.168.0.110:443 real=192.168.0.101:443 gate real=192.168.0.102:443 gate fallback=127.0.0.1:443 gate service=https request="ldirector.html" receive="Test Page" scheduler=rr persistent=1800 protocol=tcp checktype=negotiate /etc/network/interfaces on the Apache nodes: auto lo iface lo inet loopback allow-hotplug eth0 iface eth0 inet static address 192.168.0.101 (102 on other apache node) netmask 255.255.255.0 gateway 192.168.0.1 auto lo:0 iface lo:0 inet static address 192.168.0.100 netmask 255.255.255.255 pre-up sysctl -p > /dev/null auto lo:1 iface lo:1 inet static address 192.168.0.110 netmask 255.255.255.0 pre-up sysctl -p > /dev/null -- View this message in context: http://www.nabble.com/Configuring-Heartbeat-with-DRBD-and-UltraMonkey-tp19002486p19002486.html Sent from the Linux-HA mailing list archive at Nabble.com. _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems