Yes, this was the catch!. Thanks to Manuel, Tom, William and Philippe!.

We had a dev/mapper device configured, and were also missing the disk
reservation agent. Once we got the disk reservation agent configured, and
used a single path, things started to work. It was the multipath thingy that
hosed this entire activity.

You folks are awesome..thanks a ton!

Best regards,
Gak

On Thu, Jan 28, 2010 at 4:53 AM, Philippe Belliard <
philippe_belli...@symantec.com> wrote:

>  Hello,
>
>
>
> First it is required to use DiskReservation VCS Agent with LVM Linux (LVM
> and LVM2). But warning, The DiskReservation VCS agent doesn’t support
> Multiphathing on disk using on LVM and LVM2.  See VCS Agent Bundled Guide on
> DiskReservation and LVMVolumeGroup configuration.
>
> If you have multiphathing the it is required to used VxVM.
>
>
>
> Second things , I remember that in 5.0MP2 a bug exist on monitor script 
> LVMVolumeGroup
> agent's  of regarding the test to get if the VG is running via vxdgdisplay
> vgname (to depend of LVM version it return “not found” or “doesn’t exist”)
>
> Read the SYMANTEC Technote http://support.veritas.com/docs/318254 for the
> explain.
>
>
>
> I hope this help
>
> Best regards,
>
> Philippe
>
>
>
>
>
>
>
>
>
> ################################## technote 318254
> ######################################
>
>
>
> *Document ID: 318254 **
>
> *
>
> * *
>
> *The LVMVolumeGroup agent shipped with VCS 5.0MP2 triggers fake
> Concurrency Violation *
> *
> ------------------------------
> *
>
> *Details:***
>
> *The LVMVolumeGroup agent which shipped with VCS5.0MP2 for LINUX reports a
> LVM volume group resource as being "online" even if it's imported on other
> node.*
>
> *This incorrect behaviour of the LVMVolumeGroup monitor script causes a
> Concurrency Violation to occur and results in the associated servicegroup
> becoming unusable.*
>
> *This is because of the LVMVolumeGroup agent's reliance on the
> DiskReservation agent which is itself responsible for reserving the disks
> that are used by the LVM vg.*
>
> *It is the DiskReservation agent that makes vg accessible for
> import/activation on only one node, though other nodes cannot see the vg or
> its state.*
>
> * *
>
> *Description*
>
> *The LVMVolumeGroup agent's monitor script  is supposed to return the same
> exit code if the vg is offline or does not exist.*
>
> * *
>
> *Here is the part of the monitor script code responsible for doing this
> check:*
>
> *<snip>*
>
> *GREP=grep*
>
> *<snip>*
>
> *echo ${disp_out} | ${GREP} "doesn't exist" >/dev/null*
>
> *if [ $? -eq 0 ]; then*
>
> *exit ${VCSAG_RES_OFFLINE}*
>
> *fi*
>
> *where "disp_out" is:*
>
> *disp_out=`${VGDISPLAY} ${VolumeGroup} 2>&`*
>
> * *
>
> *The GREP check is supposed to be TRUE if the LVM vg is imported on
> another node or if it does not exist. Since the vg is not visible, the
> vgdisplay command returns the same message as *
>
> *in the case of the vg not existing.*
>
> * *
>
> *For example: 'vgcluster'  is configured in a service group and is
> imported on another node.  The following is the output of the 'vgdisplay'
> command from the other node in the cluster:*
>
> * *
>
> *vgdisplay vgcluster*
>
> *[Volume group "vgcluster" not found] *
>
> * *
>
> *trying with a non existent vg, named 'vgnotexist':*
>
> * *
>
> *vgdisplay vgnotexist*
>
> *[Volume group "vgnotexist" not found] *
>
> * *
>
> *so the above check is supposed to return TRUE in both of cases.*
>
> * *
>
> *However as a result of monitor script doing a GREP on the vgdisplay
> output for "doesn't exist" and the fact that this string is not in the
> vgdisplay output,this check*
>
> *will always be FALSE and will cause the script to think that the vg is
> online which causes the Concurrency Violation to occur.*
>
> * *
>
> *Solution:*
>
> *Change the following two lines in the LVMVolumeGroup monitor script as
> follows:*
>
> * *
>
> *GREP=grep*
>
> * *
>
> *to*
>
> * *
>
> *EGREP=egrep*
>
> * *
>
> *and*
>
> * *
>
> *echo ${disp_out} | ${GREP} "doesn't exist" >/dev/null*
>
> * *
>
> *to*
>
> * *
>
> *echo ${disp_out} | ${EGREP} "doesn't exist|not found" >/dev/null*
>
> * *
>
> *then restart the LVMVolumeGroup Agent:*
>
> * *
>
> *haagent -stop LVMVolumeGroup [-force] -sys <system_name>*
>
> * *
>
> *check that Agent is properly stopped*
>
> *haagent -display LVMVolumeGroup -attribute Running*
>
> *#Agent Attribute Value*
>
> *LVMVolumeGroup Running No*
>
> * *
>
> *then start the Agent:*
>
> *haagent -start LVMVolumeGroup -sys <system_name>*
>
> * *
>
> *and check that is properly running:*
>
> *haagent -display LVMVolumeGroup -attribute Running*
>
> *#Agent Attribute Value*
>
> *LVMVolumeGroup Running Yes*
>
> * *
>
> * *
>
> *IMPORTANT NOTE:*
>
> *The above fix is only applicable to customers running VCS 5.0MP2.The
> official fix for this issue has been included in VCS 5.0MP3RP2 which is now
> available.*
>
> * *
>
> * *
>
> *Related Documents:**
>
> **318496 <http://library.veritas.com/docs/318496>: Veritas Storage
> Foundation (tm) and High Availability Solutions 5.0 Maintenance Pack 3
> Rolling Patch 2 for Linux**
>  http://library.veritas.com/docs/318496
>
>
> **318498 <http://library.veritas.com/docs/318498>: Veritas Storage
> Foundation (tm) and High Availability Solutions 5.0 Maintenance Pack 3
> Rolling Patch 2 for Linux - Read This First***
>
>
>
>
>
> *From:* veritas-ha-boun...@mailman.eng.auburn.edu [mailto:
> veritas-ha-boun...@mailman.eng.auburn.edu] *On Behalf Of *Gak
> *Sent:* mercredi 27 janvier 2010 11:52
>
> *To:* Veritas-ha@mailman.eng.auburn.edu; Ganesh Kamath
> *Subject:* [Veritas-ha] VCS 5.0 MP3 on RHEL5.2
>
>
>
>
>
>
> Hi,
>
> I have a VCS 5.x config on RHEL 5.2 with LVM, not VxVM.
>
> But the group never stays online and keeps flapping between teh nodes with
> a concurrency violation message about the VG. I think it is because it sees
> the LVM VG active on both the nodes and gets confused. This is a failover
> group.
>
>
>
> 2010/01/27 20:50:04 VCS INFO V-16-6-15004 (x.x.x.x) hatrigger:Failed to
> send trigger for postoffline; script doesn't exist
> 2010/01/27 21:05:03 VCS INFO V-16-1-10299 Resource VG (Owner: unknown,
> Group: test) is online on x.x.x.x (Not initiated by VCS)
> 2010/01/27 21:05:03 VCS NOTICE V-16-1-10233 Clearing Restart attribute for
> group test on all nodes
> 2010/01/27 21:05:05 VCS INFO V-16-1-10299 Resource VG (Owner: unknown,
> Group: test) is online on x.x.x.x (Not initiated by VCS)
> 2010/01/27 21:05:05 VCS ERROR V-16-1-10214 Concurrency
> Violation:CurrentCount increased above 1 for failover group test
> 2010/01/27 21:05:05 VCS NOTICE V-16-1-10233 Clearing Restart attribute for
> group test on all nodes
> 2010/01/27 21:05:05 VCS WARNING V-16-6-15034 (x.x.x.x) violation:Offlining
> group test on system x.x.x.x
> 2010/01/27 21:05:05 VCS INFO V-16-1-50135 User root fired command: hagrp
> -offline test  x.x.x.x  from localhost
> 2010/01/27 21:05:05 VCS NOTICE V-16-1-10167 Initiating manual offline of
> group test on system x.x.x.x
> 2010/01/27 21:05:05 VCS NOTICE V-16-1-10300 Initiating Offline of Resource
> VG (Owner: unknown, Group: test) on System x.x.x.x
> 2010/01/27 21:05:05 VCS INFO V-16-6-15002 (x.x.x.x) hatrigger:hatrigger
> executed /opt/VRTSvcs/bin/triggers/violation x.x.x.x test   successfully
> 2010/01/27 21:05:07 VCS INFO V-16-1-10305 Resource VG (Owner: unknown,
> Group: test) is offline on x.x.x.x (VCS initiated)
> 2010/01/27 21:05:07 VCS NOTICE V-16-1-10446 Group test is offline on system
> x.x.x.x
> 2010/01/27 21:05:07 VCS INFO V-16-10031-15005 (x.x.x.x)
> triggers:???:nfs_postoffline:(postoffline) Invoked with arguments x.x.x.x,
> test
> 2010/01/27 21:05:07 VCS INFO V-16-6-15002 (x.x.x.x) hatrigger:hatrigger
> executed /opt/VRTSvcs/bin/triggers/nfs_postoffline x.x.x.x test
> successfully
> 2010/01/27 21:05:07 VCS INFO V-16-6-15004 (x.x.x.x) hatrigger:Failed to
> send trigger for postoffline; script doesn't exist
> 2010/01/27 21:06:04 VCS ERROR V-16-2-13067 (x.x.x.x) Agent is calling clean
> for resource(VG) because the resource became OFFLINE unexpectedly, on its
> own.
> 2010/01/27 21:06:05 VCS INFO V-16-2-13068 (x.x.x.x) Resource(VG) - clean
> completed successfully.
> 2010/01/27 21:06:06 VCS INFO V-16-1-10307 Resource VG (Owner: unknown,
> Group: test) is offline on x.x.x.x (Not initiated by VCS)
> 2010/01/27 21:06:06 VCS INFO V-16-6-15004 (x.x.x.x) hatrigger:Failed to
> send trigger for resfault; script doesn't exist
> 2010/01/27 21:14:48 VCS NOTICE V-16-1-10022 Agent LVMLogicalVolume stopped
> 2010/01/27 21:14:57 VCS INFO V-16-1-10307 Resource VG (Owner: unknown,
> Group: test) is offline on x.x.x.x (Not initiated by VCS)
> 2010/01/27 21:14:57 VCS NOTICE V-16-1-10446 Group test is offline on system
> x.x.x.x
> 2010/01/27 21:14:57 VCS NOTICE V-16-1-10301 Initiating Online of Resource
> VG (Owner: unknown, Group: test) on System x.x.x.x
> 2010/01/27 21:14:57 VCS INFO V-16-10031-15005 (x.x.x.x)
> triggers:???:nfs_postoffline:(postoffline) Invoked with arguments x.x.x.x,
> test
>
> I have also attached the main.cf file.
>
> What am i missing?. I also tried de-activating teh vg on boot by appending
> a vgchange -a n orahome, but to no avail.
>
> Any inputs will be appreciated!.
>
> thanks!
> Gak
>
>
>
>
> --
> I have the simplest of tastes;
> I am easily satisfied with the best.
>



-- 
I have the simplest of tastes;
I am easily satisfied with the best.
_______________________________________________
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Reply via email to