On 10/15/2016 12:27 PM, Dmitri Maziuk wrote:
> On 2016-10-15 01:56, Jay Scott wrote:
> 
>> So, what's wrong?  (I'm a newbie, of course.)
> 
> Here's what worked for me on centos 7:
> http://octopus.bmrb.wisc.edu/dokuwiki/doku.php?id=sysadmin:pacemaker
> YMMV and all that.

PS. I can't in all honesty recommend this setup for running NFS clusters
at this point.

About 1 in 3 times I do 'pcs standby <primary>' I get

> Oct 15 15:31:52 lionfish crmd[1137]:  notice: Initiating action 46: stop 
> drbd_filesystem_stop_0 on lionfish (local)
> Oct 15 15:31:52 lionfish Filesystem(drbd_filesystem)[32120]: INFO: Running 
> stop for /dev/drbd0 on /raid
> Oct 15 15:31:52 lionfish Filesystem(drbd_filesystem)[32120]: INFO: Trying to 
> unmount /raid
> Oct 15 15:31:52 lionfish Filesystem(drbd_filesystem)[32120]: ERROR: Couldn't 
> unmount /raid; trying cleanup with TERM
> Oct 15 15:31:52 lionfish Filesystem(drbd_filesystem)[32120]: INFO: No 
> processes on /raid were signalled. force_unmount is set to 'yes'
> Oct 15 15:31:53 lionfish Filesystem(drbd_filesystem)[32120]: ERROR: Couldn't 
> unmount /raid; trying cleanup with TERM
> Oct 15 15:31:53 lionfish Filesystem(drbd_filesystem)[32120]: INFO: No 
> processes on /raid were signalled. force_unmount is set to 'yes'
> Oct 15 15:31:54 lionfish Filesystem(drbd_filesystem)[32120]: ERROR: Couldn't 
> unmount /raid; trying cleanup with TERM
> Oct 15 15:31:54 lionfish Filesystem(drbd_filesystem)[32120]: INFO: No 
> processes on /raid were signalled. force_unmount is set to 'yes'
> Oct 15 15:31:56 lionfish Filesystem(drbd_filesystem)[32120]: ERROR: Couldn't 
> unmount /raid; trying cleanup with KILL
> Oct 15 15:31:56 lionfish Filesystem(drbd_filesystem)[32120]: INFO: No 
> processes on /raid were signalled. force_unmount is set to 'yes'
> Oct 15 15:31:57 lionfish Filesystem(drbd_filesystem)[32120]: ERROR: Couldn't 
> unmount /raid; trying cleanup with KILL
> Oct 15 15:31:57 lionfish Filesystem(drbd_filesystem)[32120]: INFO: No 
> processes on /raid were signalled. force_unmount is set to 'yes'
> Oct 15 15:31:58 lionfish Filesystem(drbd_filesystem)[32120]: ERROR: Couldn't 
> unmount /raid; trying cleanup with KILL
> Oct 15 15:31:58 lionfish Filesystem(drbd_filesystem)[32120]: INFO: No 
> processes on /raid were signalled. force_unmount is set to 'yes'
> Oct 15 15:31:59 lionfish Filesystem(drbd_filesystem)[32120]: ERROR: Couldn't 
> unmount /raid, giving up!
> Oct 15 15:32:00 lionfish lrmd[1134]:  notice: 
> drbd_filesystem_stop_0:32120:stderr [ umount: /raid: target is busy. ]
> Oct 15 15:32:00 lionfish lrmd[1134]:  notice: 
> drbd_filesystem_stop_0:32120:stderr [         (In some cases useful info 
> about processes that use ]
> Oct 15 15:32:00 lionfish lrmd[1134]:  notice: 
> drbd_filesystem_stop_0:32120:stderr [          the device is found by lsof(8) 
> or fuser(1)) ]
> Oct 15 15:32:00 lionfish lrmd[1134]:  notice: 
> drbd_filesystem_stop_0:32120:stderr [ ocf-exit-reason:Couldn't unmount /raid; 
> trying cleanup with TERM ]
> Oct 15 15:32:00 lionfish lrmd[1134]:  notice: 
> drbd_filesystem_stop_0:32120:stderr [ umount: /raid: target is busy. ]
> Oct 15 15:32:00 lionfish lrmd[1134]:  notice: 
> drbd_filesystem_stop_0:32120:stderr [         (In some cases useful info 
> about processes that use ]
> Oct 15 15:32:00 lionfish lrmd[1134]:  notice: 
> drbd_filesystem_stop_0:32120:stderr [          the device is found by lsof(8) 
> or fuser(1)) ]
> Oct 15 15:32:00 lionfish lrmd[1134]:  notice: 
> drbd_filesystem_stop_0:32120:stderr [ ocf-exit-reason:Couldn't unmount /raid; 
> trying cleanup with TERM ]
> Oct 15 15:32:00 lionfish lrmd[1134]:  notice: 
> drbd_filesystem_stop_0:32120:stderr [ umount: /raid: target is busy. ]
> Oct 15 15:32:00 lionfish lrmd[1134]:  notice: 
> drbd_filesystem_stop_0:32120:stderr [         (In some cases useful info 
> about processes that use ]
> Oct 15 15:32:00 lionfish lrmd[1134]:  notice: 
> drbd_filesystem_stop_0:32120:stderr [          the device is found by lsof(8) 
> or fuser(1)) ]
> Oct 15 15:32:00 lionfish lrmd[1134]:  notice: 
> drbd_filesystem_stop_0:32120:stderr [ ocf-exit-reason:Couldn't unmount /raid; 
> trying cleanup with TERM ]
> Oct 15 15:32:00 lionfish lrmd[1134]:  notice: 
> drbd_filesystem_stop_0:32120:stderr [ umount: /raid: target is busy. ]
> Oct 15 15:32:00 lionfish lrmd[1134]:  notice: 
> drbd_filesystem_stop_0:32120:stderr [         (In some cases useful info 
> about processes that use ]
> Oct 15 15:32:00 lionfish lrmd[1134]:  notice: 
> drbd_filesystem_stop_0:32120:stderr [          the device is found by lsof(8) 
> or fuser(1)) ]
> Oct 15 15:32:00 lionfish lrmd[1134]:  notice: 
> drbd_filesystem_stop_0:32120:stderr [ ocf-exit-reason:Couldn't unmount /raid; 
> trying cleanup with KILL ]
> Oct 15 15:32:00 lionfish lrmd[1134]:  notice: 
> drbd_filesystem_stop_0:32120:stderr [ umount: /raid: target is busy. ]
> Oct 15 15:32:00 lionfish lrmd[1134]:  notice: 
> drbd_filesystem_stop_0:32120:stderr [         (In some cases useful info 
> about processes that use ]
> Oct 15 15:32:00 lionfish lrmd[1134]:  notice: 
> drbd_filesystem_stop_0:32120:stderr [          the device is found by lsof(8) 
> or fuser(1)) ]
> Oct 15 15:32:00 lionfish lrmd[1134]:  notice: 
> drbd_filesystem_stop_0:32120:stderr [ ocf-exit-reason:Couldn't unmount /raid; 
> trying cleanup with KILL ]
> Oct 15 15:32:00 lionfish lrmd[1134]:  notice: 
> drbd_filesystem_stop_0:32120:stderr [ umount: /raid: target is busy. ]
> Oct 15 15:32:00 lionfish lrmd[1134]:  notice: 
> drbd_filesystem_stop_0:32120:stderr [         (In some cases useful info 
> about processes that use ]
> Oct 15 15:32:00 lionfish lrmd[1134]:  notice: 
> drbd_filesystem_stop_0:32120:stderr [          the device is found by lsof(8) 
> or fuser(1)) ]
> Oct 15 15:32:00 lionfish lrmd[1134]:  notice: 
> drbd_filesystem_stop_0:32120:stderr [ ocf-exit-reason:Couldn't unmount /raid; 
> trying cleanup with KILL ]
> Oct 15 15:32:00 lionfish lrmd[1134]:  notice: 
> drbd_filesystem_stop_0:32120:stderr [ ocf-exit-reason:Couldn't unmount /raid, 
> giving up! ]
> Oct 15 15:32:00 lionfish crmd[1137]:  notice: Operation 
> drbd_filesystem_stop_0: unknown error (node=lionfish, call=91, rc=1, 
> cib-update=107, confirmed=true)
> Oct 15 15:32:00 lionfish crmd[1137]:  notice: 
> lionfish-drbd_filesystem_stop_0:91 [ umount: /raid: target is busy.\n        
> (In some cases useful info about p
> rocesses that use\n         the device is found by lsof(8) or 
> fuser(1))\nocf-exit-reason:Couldn't unmount /raid; trying cleanup with 
> TERM\numount: /raid: tar
> get is busy.\n        (In some cases useful info about processes that use\n   
>       the device is found by lsof(8) or fuser(1))\nocf-exit-reason:Couldn't 
> unm
> ount /raid; trying cleanup with TERM\numount: /raid: target is busy.\n
> Oct 15 15:32:00 lionfish crmd[1137]: warning: Action 46 
> (drbd_filesystem_stop_0) on lionfish failed (target: 0 vs. rc: 1): Error
> Oct 15 15:32:00 lionfish crmd[1137]:  notice: Transition aborted by 
> drbd_filesystem_stop_0 'modify' on lionfish: Event failed 
> (magic=0:1;46:4:0:700f71e0-d565
> -496f-a2c6-6b97f0cfd940, cib=0.128.10, source=match_graph_event:381, 0)

and I have to take trip to the server room to power-cycle (aka stonith)
the nodes.

I haven't tried digging into it yet, for all I know the problem may be
between the centos kernel and tainted elrepo drbd module -- "no
processes were signalled" while "target is busy" may be a bug in the RA
of course...

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to