I would be more concerned about future failures being handled properly. If you were able to take out all networks from all nodes at same time, you have a SPOF. If this was a one time maintenance upgrade to your network gear and not a normal event, setting VCS to not respond to network events means that future cable or port issues will not be handled. If it is a common occurrence for all networks to be lost, perhaps you need to address the network issues :-)
-----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of DeMontier, Frank Sent: Monday, October 20, 2008 11:10 AM To: Paul Robertson; veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] IPMultiNICB, mpathd and network outages FaultPropagation=0 should do it. Buddy DeMontier State Street Global Advisors Infrastructure Technical Services Boston Ma 02111 -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Paul Robertson Sent: Monday, October 20, 2008 10:37 AM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] IPMultiNICB, mpathd and network outages We recently experienced a Cisco network issue which prevented all nodes in that subnet from accessing the default gateway for about a minute. The Solaris nodes which run probe-based IPMP reported that all interfaces had failed because they were unable to ping the default gateway; however, they came back within seconds once the network issue was resolved. Fine. Unfortunately, our VCS nodes initiated an offline of the service group after the IPMultiNICB resources detected the IPMP fault. Since the service group offline/online takes several minutes, the outage on these nodes was more painful. Furthermore, since the peer cluster nodes in the same subnet were also experiencing the same mpathd fault, there would have been little advantage to failing over the service group to another node. We would like to find a way to configure VCS so that the service group does not offline (and any dependent resources within the service group are not offlined) in the event of an mpathd (i.e. IPMultiNICB) failure. In looking through the documentation, it seems that the closest we can come is to increase the IPMultiNICB ToleranceLimit from "1" to a huge value: # hatype -modify IPMultiNICB ToleranceLimit 9999 This should achieve our desired goal, but I can't help thinking that it's an ugly hack, and that there must be a better way. Any suggestions are appreciated. Cheers, Paul P.S. A snippet of the main.cf file is listed below: group multinicbsg ( SystemList = { app04 = 1, app05 = 2 } Parallel = 1 ) MultiNICB multinicb ( UseMpathd = 1 MpathdCommand = "/usr/lib/inet/in.mpathd -a" Device = { ce0 = 0, ce4 = 2 } DefaultRouter = "192.168.9.1" ) Phantom phantomb ( ) phantomb requires multinicb group app_grp ( SystemList = { app04 = 0, app05 = 0 } ) IPMultiNICB app_ip ( BaseResName = multinicb Address = "192.168.9.34" NetMask = "255.255.255.0" Proxy appmnic_proxy ( TargetResName = multinicb ) (various other resources, including some that depend on app_ip excluded for brevity) app_ip requires appmnic_proxy _______________________________________________ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha _______________________________________________ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha _______________________________________________ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha