In an OCFS2 cluster of XenServer 7.1.1 hosts, we met the same issue.
--
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to ocfs2-tools in Ubuntu.
https://bugs.launchpad.net/bugs/613793
Title:
o2cb stopping Failed
Status in ocfs2-tools package in Ubuntu:
Confirmed
Bug description:
Binary package hint: ocfs2-tools
Ubuntu release:
Description: Ubuntu 10.04.1 LTS
Release: 10.04
Package version:
ocfs2-tools 1.4.3-1
The script /etc/init.d/o2cb exits with an error when stopped and the services
do not stop.
Here the error message:
/etc/init.d/o2cb stop
Stopping O2CB cluster ocfs2: Failed
Unable to stop cluster as heartbeat region still active
I have identified a first error in the script. In the function
clean_heartbeat the following if:
if [ ! -f "$(configfs_path)/cluster/${CLUSTER}/heartbeat/*" ]
then
return
fi
is always true and the function returns. If the intention was to check
the existence of the directory code must be:
if [ ! -d "$(configfs_path)/cluster/${CLUSTER}/heartbeat/" ]
then
echo "OK"
return
fi
An error persist even after these changes.
/etc/init.d/o2cb stop
Cleaning heartbeat on ocfs2: Failed
At least one heartbeat region still active
I added some lines for debugging by changing the function so:
#
# clean_heartbeat()
# Removes the inactive heartbeat regions
#
clean_heartbeat()
{
if [ "$#" -lt "1" -o -z "$1" ]
then
echo "clean_heartbeat(): Requires an argument" >&2
return 1
fi
CLUSTER="$1"
if [ ! -d "$(configfs_path)/cluster/${CLUSTER}/heartbeat/" ]
then
echo "OK"
return
fi
echo -n "Cleaning heartbeat on ${CLUSTER}: "
ls -1 "$(configfs_path)/cluster/${CLUSTER}/heartbeat/" | while read HBUUID
do
if [ ! -d "$(configfs_path)/cluster/${CLUSTER}/heartbeat/${HBUUID}" ]
then
continue
fi
echo
echo "DEBUG ocfs2_hb_ctl -I -u ${HBUUID} 2>&1"
OUTPUT="`ocfs2_hb_ctl -I -u ${HBUUID} 2>&1`"
if [ $? != 0 ]
then
echo "Failed"
echo "${OUTPUT}" >&2
exit 1
fi
echo "DEBUG ${OUTPUT}"
REF="`echo ${OUTPUT} | awk '/refs/ {print $2; exit;}' 2>&1`"
echo "DEBUG REF=$REF"
if [ $REF != 0 ]
then
echo "Failed"
echo "At least one heartbeat region still active" >&2
exit 1
else
OUTPUT="`ocfs2_hb_ctl -K -u ${HBUUID} 2>&1`"
fi
done
if [ $? = 1 ]
then
exit 1
fi
echo "OK"
}
The new output is:
/etc/init.d/o2cb stop
Cleaning heartbeat on ocfs2:
DEBUG ocfs2_hb_ctl -I -u FC046AD7B2584E7EB12A7293993C81B0 2>&1
DEBUG FC046AD7B2584E7EB12A7293993C81B0: 2 refs
DEBUG REF=2
Failed
At least one heartbeat region still active
At this point I checked the source code ocfs2_hb_ctl. The command
ocfs2_hb_ctl-I-u ${HBUUID} returns the number of references in a semaphore used
by programs that manage ocfs filesystem. In the source file libo2cb/o2cb_api.c:
- the function o2cb_mutex_down increases the second semaphore;
- the function o2cb_mutex_up decreases the first semaphore;
- the function __o2cb_get_ref increases the first semaphore;
- the function __o2cb_drop_ref decreases the first semaphore.
I have not found the point where the second semaphore is decreased.
This could be the cause of the error.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ocfs2-tools/+bug/613793/+subscriptions
_______________________________________________
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : [email protected]
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help : https://help.launchpad.net/ListHelp