Hi Ulrich,
I reviewed the crm configuration file, there are some comments as below,
1) lvmlockd resource is used for shared VG, if you do not plan to add
any shared VG in your cluster, I suggest to drop this resource and clone.
2) second, lvmlockd service depends on DLM service, it will create
"lvm_xxx" related lock spaces when any shared VG is created/activated.
but some other resource also depends on DLM to create lock spaces for
avoiding race condition, e.g. clustered MD, ocfs2, etc. Then, the file
system resource should start later than lvm2(lvmlockd) related resources.
That means this order should be wrong.
order ord_lockspace_fs__lvmlockd Mandatory: cln_lockspace_ocfs2 cln_lvmlock
Thanks
Gang
On 2021/1/21 20:08, Ulrich Windl wrote:
Gang He <g...@suse.com> schrieb am 21.01.2021 um 11:30 in Nachricht
<59b543ee-0824-6b91-d0af-48f66922b...@suse.com>:
Hi Ulrich,
The problem is reproduced stably? could you help to share your
pacemaker crm configure and OS/lvm2/resource‑agents related version
information?
OK, the problem occurred on every node, so I guess it's reproducible.
OS is SLES15 SP2 with all current updates (lvm2-2.03.05-8.18.1.x86_64,
pacemaker-2.0.4+20200616.2deceaa3a-3.3.1.x86_64,
resource-agents-4.4.0+git57.70549516-3.12.1.x86_64).
The configuration (somewhat trimmed) is attached.
The only VG the cluster node sees is:
ph16:~ # vgs
VG #PV #LV #SN Attr VSize VFree
sys 1 3 0 wz--n- 222.50g 0
Regards,
Ulrich
I feel the problem was probably caused by lvmlock resource agent script,
which did not handle this corner case correctly.
Thanks
Gang
On 2021/1/21 17:53, Ulrich Windl wrote:
Hi!
I have a problem: For tests I had configured lvmlockd. Now that the tests
have ended, no LVM is used for cluster resources any more, but lvmlockd is
still configured.
Unfortunately I ran into this problem:
On OCFS2 mount was unmounted successfully, another holding the lockspace
for
lvmlockd is still active.
lvmlockd shuts down. At least it says so.
Unfortunately that stop never succeeds (runs into a timeout).
My suspect is something like this:
Some non‑LVM lock exists for the now unmounted OCFS2 filesystem.
lvmlockd want to access that filesystem for unknown reasons.
I don't understand waht's going on.
The events at nod shutdown were:
Some Xen PVM was live‑migrated successfully to another node, but during
that
there was a message like this:
Jan 21 10:20:13 h19 virtlockd[41990]: libvirt version: 6.0.0
Jan 21 10:20:13 h19 virtlockd[41990]: hostname: h19
Jan 21 10:20:13 h19 virtlockd[41990]: resource busy: Lockspace resource
'4c6bebd1f4bc581255b422a65d317f31deef91f777e51ba0daf04419dda7ade5' is not
locked
Jan 21 10:20:13 h19 libvirtd[41991]: libvirt version: 6.0.0
Jan 21 10:20:13 h19 libvirtd[41991]: hostname: h19
Jan 21 10:20:13 h19 libvirtd[41991]: resource busy: Lockspace resource
'4c6bebd1f4bc581255b422a65d317f31deef91f777e51ba0daf04419dda7ade5' is not
locked
Jan 21 10:20:13 h19 libvirtd[41991]: Unable to release lease on test‑jeos4
Jan 21 10:20:13 h19 VirtualDomain(prm_xen_test‑jeos4)[32786]: INFO:
test‑jeos4: live migration to h18 succeeded.
Unfortnuately the log message makes it practically impossible to guess what
the locked object actually is (indirect lock using SHA256 as hash it
seems).
Then the OCFS for the VM images unmounts successfully while the stop of
lvmlockd is still busy:
Jan 21 10:20:16 h19 lvmlockd(prm_lvmlockd)[32945]: INFO: stop the
lockspaces
of shared VG(s)...
...
Jan 21 10:21:56 h19 pacemaker‑controld[42493]: error: Result of stop
operation for prm_lvmlockd on h19: Timed Out
As said before: I don't have shared VGs any more. I don't understand.
On a node without VMs running I see:
h19:~ # lvmlockctl ‑d
1611221190 lvmlockd started
1611221190 No lockspaces found to adopt
1611222560 new cl 1 pi 2 fd 8
1611222560 recv client[10817] cl 1 dump_info . "" mode iv flags 0
1611222560 send client[10817] cl 1 dump result 0 dump_len 149
1611222560 send_dump_buf delay 0 total 149
1611222560 close client[10817] cl 1 fd 8
1611222563 new cl 2 pi 2 fd 8
1611222563 recv client[10818] cl 2 dump_log . "" mode iv flags 0
On a node with VMs running I see:
h16:~ # lvmlockctl ‑d
1611216942 lvmlockd started
1611216942 No lockspaces found to adopt
1611221684 new cl 1 pi 2 fd 8
1611221684 recv pvs[17159] cl 1 lock gl "" mode sh flags 0
1611221684 lockspace "lvm_global" not found for dlm gl, adding...
1611221684 add_lockspace_thread dlm lvm_global version 0
1611221684 S lvm_global lm_add_lockspace dlm wait 0 adopt 0
1611221685 S lvm_global lm_add_lockspace done 0
1611221685 S lvm_global R GLLK action lock sh
1611221685 S lvm_global R GLLK res_lock cl 1 mode sh
1611221685 S lvm_global R GLLK lock_dlm
1611221685 S lvm_global R GLLK res_lock rv 0 read vb 0 0 0
1611221685 S lvm_global R GLLK res_lock all versions zero
1611221685 S lvm_global R GLLK res_lock invalidate global state
1611221685 send pvs[17159] cl 1 lock gl rv 0
1611221685 recv pvs[17159] cl 1 lock vg "sys" mode sh flags 0
1611221685 lockspace "lvm_sys" not found
1611221685 send pvs[17159] cl 1 lock vg rv ‑210 ENOLS
1611221685 close pvs[17159] cl 1 fd 8
1611221685 S lvm_global R GLLK res_unlock cl 1 from close
1611221685 S lvm_global R GLLK unlock_dlm
1611221685 S lvm_global R GLLK res_unlock lm done
1611222582 new cl 2 pi 2 fd 8
1611222582 recv client[19210] cl 2 dump_log . "" mode iv flags 0
Note: "lvm_sys" may refer to VG sys used for the hypervisor.
Regards,
Ulrich
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/