Yes, this was the catch!. Thanks to Manuel, Tom, William and Philippe!. We had a dev/mapper device configured, and were also missing the disk reservation agent. Once we got the disk reservation agent configured, and used a single path, things started to work. It was the multipath thingy that hosed this entire activity.
You folks are awesome..thanks a ton! Best regards, Gak On Thu, Jan 28, 2010 at 4:53 AM, Philippe Belliard < philippe_belli...@symantec.com> wrote: > Hello, > > > > First it is required to use DiskReservation VCS Agent with LVM Linux (LVM > and LVM2). But warning, The DiskReservation VCS agent doesn’t support > Multiphathing on disk using on LVM and LVM2. See VCS Agent Bundled Guide on > DiskReservation and LVMVolumeGroup configuration. > > If you have multiphathing the it is required to used VxVM. > > > > Second things , I remember that in 5.0MP2 a bug exist on monitor script > LVMVolumeGroup > agent's of regarding the test to get if the VG is running via vxdgdisplay > vgname (to depend of LVM version it return “not found” or “doesn’t exist”) > > Read the SYMANTEC Technote http://support.veritas.com/docs/318254 for the > explain. > > > > I hope this help > > Best regards, > > Philippe > > > > > > > > > > ################################## technote 318254 > ###################################### > > > > *Document ID: 318254 ** > > * > > * * > > *The LVMVolumeGroup agent shipped with VCS 5.0MP2 triggers fake > Concurrency Violation * > * > ------------------------------ > * > > *Details:*** > > *The LVMVolumeGroup agent which shipped with VCS5.0MP2 for LINUX reports a > LVM volume group resource as being "online" even if it's imported on other > node.* > > *This incorrect behaviour of the LVMVolumeGroup monitor script causes a > Concurrency Violation to occur and results in the associated servicegroup > becoming unusable.* > > *This is because of the LVMVolumeGroup agent's reliance on the > DiskReservation agent which is itself responsible for reserving the disks > that are used by the LVM vg.* > > *It is the DiskReservation agent that makes vg accessible for > import/activation on only one node, though other nodes cannot see the vg or > its state.* > > * * > > *Description* > > *The LVMVolumeGroup agent's monitor script is supposed to return the same > exit code if the vg is offline or does not exist.* > > * * > > *Here is the part of the monitor script code responsible for doing this > check:* > > *<snip>* > > *GREP=grep* > > *<snip>* > > *echo ${disp_out} | ${GREP} "doesn't exist" >/dev/null* > > *if [ $? -eq 0 ]; then* > > *exit ${VCSAG_RES_OFFLINE}* > > *fi* > > *where "disp_out" is:* > > *disp_out=`${VGDISPLAY} ${VolumeGroup} 2>&`* > > * * > > *The GREP check is supposed to be TRUE if the LVM vg is imported on > another node or if it does not exist. Since the vg is not visible, the > vgdisplay command returns the same message as * > > *in the case of the vg not existing.* > > * * > > *For example: 'vgcluster' is configured in a service group and is > imported on another node. The following is the output of the 'vgdisplay' > command from the other node in the cluster:* > > * * > > *vgdisplay vgcluster* > > *[Volume group "vgcluster" not found] * > > * * > > *trying with a non existent vg, named 'vgnotexist':* > > * * > > *vgdisplay vgnotexist* > > *[Volume group "vgnotexist" not found] * > > * * > > *so the above check is supposed to return TRUE in both of cases.* > > * * > > *However as a result of monitor script doing a GREP on the vgdisplay > output for "doesn't exist" and the fact that this string is not in the > vgdisplay output,this check* > > *will always be FALSE and will cause the script to think that the vg is > online which causes the Concurrency Violation to occur.* > > * * > > *Solution:* > > *Change the following two lines in the LVMVolumeGroup monitor script as > follows:* > > * * > > *GREP=grep* > > * * > > *to* > > * * > > *EGREP=egrep* > > * * > > *and* > > * * > > *echo ${disp_out} | ${GREP} "doesn't exist" >/dev/null* > > * * > > *to* > > * * > > *echo ${disp_out} | ${EGREP} "doesn't exist|not found" >/dev/null* > > * * > > *then restart the LVMVolumeGroup Agent:* > > * * > > *haagent -stop LVMVolumeGroup [-force] -sys <system_name>* > > * * > > *check that Agent is properly stopped* > > *haagent -display LVMVolumeGroup -attribute Running* > > *#Agent Attribute Value* > > *LVMVolumeGroup Running No* > > * * > > *then start the Agent:* > > *haagent -start LVMVolumeGroup -sys <system_name>* > > * * > > *and check that is properly running:* > > *haagent -display LVMVolumeGroup -attribute Running* > > *#Agent Attribute Value* > > *LVMVolumeGroup Running Yes* > > * * > > * * > > *IMPORTANT NOTE:* > > *The above fix is only applicable to customers running VCS 5.0MP2.The > official fix for this issue has been included in VCS 5.0MP3RP2 which is now > available.* > > * * > > * * > > *Related Documents:** > > **318496 <http://library.veritas.com/docs/318496>: Veritas Storage > Foundation (tm) and High Availability Solutions 5.0 Maintenance Pack 3 > Rolling Patch 2 for Linux** > http://library.veritas.com/docs/318496 > > > **318498 <http://library.veritas.com/docs/318498>: Veritas Storage > Foundation (tm) and High Availability Solutions 5.0 Maintenance Pack 3 > Rolling Patch 2 for Linux - Read This First*** > > > > > > *From:* veritas-ha-boun...@mailman.eng.auburn.edu [mailto: > veritas-ha-boun...@mailman.eng.auburn.edu] *On Behalf Of *Gak > *Sent:* mercredi 27 janvier 2010 11:52 > > *To:* Veritas-ha@mailman.eng.auburn.edu; Ganesh Kamath > *Subject:* [Veritas-ha] VCS 5.0 MP3 on RHEL5.2 > > > > > > > Hi, > > I have a VCS 5.x config on RHEL 5.2 with LVM, not VxVM. > > But the group never stays online and keeps flapping between teh nodes with > a concurrency violation message about the VG. I think it is because it sees > the LVM VG active on both the nodes and gets confused. This is a failover > group. > > > > 2010/01/27 20:50:04 VCS INFO V-16-6-15004 (x.x.x.x) hatrigger:Failed to > send trigger for postoffline; script doesn't exist > 2010/01/27 21:05:03 VCS INFO V-16-1-10299 Resource VG (Owner: unknown, > Group: test) is online on x.x.x.x (Not initiated by VCS) > 2010/01/27 21:05:03 VCS NOTICE V-16-1-10233 Clearing Restart attribute for > group test on all nodes > 2010/01/27 21:05:05 VCS INFO V-16-1-10299 Resource VG (Owner: unknown, > Group: test) is online on x.x.x.x (Not initiated by VCS) > 2010/01/27 21:05:05 VCS ERROR V-16-1-10214 Concurrency > Violation:CurrentCount increased above 1 for failover group test > 2010/01/27 21:05:05 VCS NOTICE V-16-1-10233 Clearing Restart attribute for > group test on all nodes > 2010/01/27 21:05:05 VCS WARNING V-16-6-15034 (x.x.x.x) violation:Offlining > group test on system x.x.x.x > 2010/01/27 21:05:05 VCS INFO V-16-1-50135 User root fired command: hagrp > -offline test x.x.x.x from localhost > 2010/01/27 21:05:05 VCS NOTICE V-16-1-10167 Initiating manual offline of > group test on system x.x.x.x > 2010/01/27 21:05:05 VCS NOTICE V-16-1-10300 Initiating Offline of Resource > VG (Owner: unknown, Group: test) on System x.x.x.x > 2010/01/27 21:05:05 VCS INFO V-16-6-15002 (x.x.x.x) hatrigger:hatrigger > executed /opt/VRTSvcs/bin/triggers/violation x.x.x.x test successfully > 2010/01/27 21:05:07 VCS INFO V-16-1-10305 Resource VG (Owner: unknown, > Group: test) is offline on x.x.x.x (VCS initiated) > 2010/01/27 21:05:07 VCS NOTICE V-16-1-10446 Group test is offline on system > x.x.x.x > 2010/01/27 21:05:07 VCS INFO V-16-10031-15005 (x.x.x.x) > triggers:???:nfs_postoffline:(postoffline) Invoked with arguments x.x.x.x, > test > 2010/01/27 21:05:07 VCS INFO V-16-6-15002 (x.x.x.x) hatrigger:hatrigger > executed /opt/VRTSvcs/bin/triggers/nfs_postoffline x.x.x.x test > successfully > 2010/01/27 21:05:07 VCS INFO V-16-6-15004 (x.x.x.x) hatrigger:Failed to > send trigger for postoffline; script doesn't exist > 2010/01/27 21:06:04 VCS ERROR V-16-2-13067 (x.x.x.x) Agent is calling clean > for resource(VG) because the resource became OFFLINE unexpectedly, on its > own. > 2010/01/27 21:06:05 VCS INFO V-16-2-13068 (x.x.x.x) Resource(VG) - clean > completed successfully. > 2010/01/27 21:06:06 VCS INFO V-16-1-10307 Resource VG (Owner: unknown, > Group: test) is offline on x.x.x.x (Not initiated by VCS) > 2010/01/27 21:06:06 VCS INFO V-16-6-15004 (x.x.x.x) hatrigger:Failed to > send trigger for resfault; script doesn't exist > 2010/01/27 21:14:48 VCS NOTICE V-16-1-10022 Agent LVMLogicalVolume stopped > 2010/01/27 21:14:57 VCS INFO V-16-1-10307 Resource VG (Owner: unknown, > Group: test) is offline on x.x.x.x (Not initiated by VCS) > 2010/01/27 21:14:57 VCS NOTICE V-16-1-10446 Group test is offline on system > x.x.x.x > 2010/01/27 21:14:57 VCS NOTICE V-16-1-10301 Initiating Online of Resource > VG (Owner: unknown, Group: test) on System x.x.x.x > 2010/01/27 21:14:57 VCS INFO V-16-10031-15005 (x.x.x.x) > triggers:???:nfs_postoffline:(postoffline) Invoked with arguments x.x.x.x, > test > > I have also attached the main.cf file. > > What am i missing?. I also tried de-activating teh vg on boot by appending > a vgchange -a n orahome, but to no avail. > > Any inputs will be appreciated!. > > thanks! > Gak > > > > > -- > I have the simplest of tastes; > I am easily satisfied with the best. > -- I have the simplest of tastes; I am easily satisfied with the best.
_______________________________________________ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha