I would be more concerned about future failures being handled properly.
If you were able to take out all networks from all nodes at same time,
you have a SPOF. If this was a one time maintenance upgrade to your
network gear and not a normal event, setting VCS to not respond to
network events means that future cable or port issues will not be
handled.
If it is a common occurrence for all networks to be lost, perhaps you
need to address the network issues :-)



-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
DeMontier, Frank
Sent: Monday, October 20, 2008 11:10 AM
To: Paul Robertson; veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] IPMultiNICB, mpathd and network outages

FaultPropagation=0 should do it.

Buddy DeMontier
State Street Global Advisors
Infrastructure Technical Services
Boston Ma 02111

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Paul
Robertson
Sent: Monday, October 20, 2008 10:37 AM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] IPMultiNICB, mpathd and network outages

We recently experienced a Cisco network issue which prevented all
nodes in that subnet from accessing the default gateway for about a
minute.

The Solaris nodes which run probe-based IPMP reported that all
interfaces had failed because they were unable to ping the default
gateway; however, they came back within seconds once the network issue
was resolved. Fine.

Unfortunately, our VCS nodes initiated an offline of the service group
after the IPMultiNICB resources detected the IPMP fault. Since the
service group offline/online takes several minutes, the outage on
these nodes was more painful. Furthermore, since the peer cluster
nodes in the same subnet were also experiencing the same mpathd fault,
there would have been little advantage to failing over the service
group to another node.

We would like to find a way to configure VCS so that the service group
does not offline (and any dependent resources within the service group
are not offlined) in the event of an mpathd (i.e. IPMultiNICB)
failure. In looking through the documentation, it seems that the
closest we can come is to increase the IPMultiNICB ToleranceLimit from
"1" to a huge value:

 # hatype -modify IPMultiNICB ToleranceLimit 9999

This should achieve our desired goal, but I can't help thinking that
it's an ugly hack, and that there must be a better way. Any
suggestions are appreciated.

Cheers,

Paul

P.S. A snippet of the main.cf file is listed below:


 group multinicbsg (
   SystemList = { app04 = 1, app05 = 2 }
   Parallel = 1
   )

   MultiNICB multinicb (
           UseMpathd = 1
           MpathdCommand = "/usr/lib/inet/in.mpathd -a"
           Device = { ce0 = 0, ce4 = 2 }
           DefaultRouter = "192.168.9.1"
           )

   Phantom phantomb (
           )

   phantomb requires multinicb

 group app_grp (
   SystemList = { app04 = 0, app05 = 0 }
   )

   IPMultiNICB app_ip (
           BaseResName = multinicb
           Address = "192.168.9.34"
           NetMask = "255.255.255.0"

   Proxy appmnic_proxy (
           TargetResName = multinicb
           )

   (various other resources, including some that depend on app_ip
   excluded for brevity)

   app_ip requires appmnic_proxy
_______________________________________________
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

_______________________________________________
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

_______________________________________________
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Reply via email to