Hi, I got some trouble since one week and can't find solution by myself. Any help will be really appreciated ! I use corosync / pacemaker for 3 or 4 years and all works well, for failover or load-balancing.
I have shared ip between 3 servers, and need to remove one for upgrade. But after I remove the server from the cluster i got random fail to access to my shared ip. I think first that some packet want go to the old server. So I put it again in the cluster, can reach it, but random failure is still here :-/ My test is just a curl http://my_ip (or ssh same stuff, random failed to connect). A ping didn't loose any packet. I can reach each of the three servers, but sometime, the request hang, and got a timeout. I see via tcpdump the packet coming, and resend, but no one respond. How I can diagnostic this ? I think one request on five fail. But I didn't see any messages in firewall or /var/log/message, nothing, just like the switch choose to remove random packet. I didn't see any counter on network interface, check the iptable setting, recheck the log, recheck all firewall ... Where go these packets ?? I try with another new ip, and same problem append. I try ip on two differents subnets (10.xxx and external ip) and same stuff. I have no problem with virtual ip in failover mode. If someone has any clue ... _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org