Hi Lars, On Mon, Oct 16, 2017 at 08:52:04PM +0200, Lars Ellenberg wrote: > On Mon, Oct 16, 2017 at 08:09:21PM +0200, Dejan Muhamedagic wrote: > > Hi, > > > > On Thu, Oct 12, 2017 at 03:30:30PM +0900, Christian Balzer wrote: > > > > > > Hello, > > > > > > 2nd post in 10 years, lets see if this one gets an answer unlike the first > > > one... > > Do you want to make me check for the old one? ;-) > > > > One of the main use cases for pacemaker here are DRBD replicated > > > active/active mailbox servers (dovecot/exim) on Debian machines. > > > We've been doing this for a loong time, as evidenced by the oldest pair > > > still running Wheezy with heartbeat and pacemaker 1.1.7. > > > > > > The majority of cluster pairs is on Jessie with corosync and backported > > > pacemaker 1.1.16. > > > > > > Yesterday we had a hiccup, resulting in half the machines loosing > > > their upstream router for 50 seconds which in turn caused the pingd RA to > > > trigger a fail-over of the DRBD RA and associated resource group > > > (filesystem/IP) to the other node. > > > > > > The old cluster performed flawlessly, the newer clusters all wound up with > > > DRBD and FS resource being BLOCKED as the processes holding open the > > > filesystem didn't get killed fast enough. > > > > > > Comparing the 2 RAs (no versioning T_T) reveals a large change in the > > > "signal_processes" routine. > > > > > > So with the old Filesystem RA using fuser we get something like this and > > > thousands of processes killed per second: > > > --- > > > Oct 11 15:06:35 mbx07 lrmd: [4731]: info: RA output: > > > (res_Filesystem_mb07:stop:stdout) 3478 3593 ... > > > Oct 11 15:06:35 mbx07 lrmd: [4731]: info: RA output: > > > (res_Filesystem_mb07:stop:stderr) > > > cmccmccmccmcmcmcmcmccmccmcmcmcmcmcmcmcmcmcmcmcmccmcm > > > Oct 11 15:06:35 mbx07 lrmd: [4731]: info: RA output: > > > (res_Filesystem_mb07:stop:stdout) 4032 4058 ... > > > --- > > > > > > Whereas the new RA (newer isn't better) that goes around killing processes > > > individually with beautiful logging was a total fail at about 4 processes > > > per second killed... > > > --- > > > Oct 11 15:06:46 mbx10 Filesystem(res_Filesystem_mb10)[288712]: INFO: > > > sending signal TERM to: mail 4226 4909 0 09:43 ? S > > > 0:00 dovecot/imap > > > Oct 11 15:06:46 mbx10 Filesystem(res_Filesystem_mb10)[288712]: INFO: > > > sending signal TERM to: mail 4229 4909 0 09:43 ? S > > > 0:00 dovecot/imap [idling] > > > Oct 11 15:06:46 mbx10 Filesystem(res_Filesystem_mb10)[288712]: INFO: > > > sending signal TERM to: mail 4238 4909 0 09:43 ? S > > > 0:00 dovecot/imap > > > Oct 11 15:06:46 mbx10 Filesystem(res_Filesystem_mb10)[288712]: INFO: > > > sending signal TERM to: mail 4239 4909 0 09:43 ? S > > > 0:00 dovecot/imap > > > --- > > > > > > So my questions are: > > > > > > 1. Am I the only one with more than a handful of processes per FS who > > > can't afford to wait hours the new routine to finish? > > > > The change was introduced about five years ago. > > Also, usually there should be no process anymore, > because whatever is using the Filesystem should have it's own RA, > which should have appropriate constraints, > which means that should have been called and "stop"ped first, > before the Filesystem stop and umount, and only the "accidental, > stray, abandoned, idle since three weeks, operator shell session, > that happend to cd into that file system" is supposed to be around > *unexpectedly* and in need of killing, and not "thousands of service > processes, expectedly".
Indeed, but obviously one can never tell ;-) > So arguably your setup is broken, Or the other RA didn't/couldn't stop the resource ... > relying on a fall-back workaround > which used to "perform" better. > > The bug is not that this fall-back workaround now > has pretty printing and is much slower (and eventually times out), > the bug is that you don't properly kill the service first. > [and that you don't have fencing]. ... and didn't exit with an appropriate exit code (i.e. fail). > > > 2. Can we have the old FUSER (kill) mode back? > > > > Yes. I'll make a pull request. > > Still, that's a sane thing to do, > thanks, dejanm. Right. We probably cannot fix all issues coming from various RAs or configurations, but we should at least try a bit harder. > Maybe we can even come up with a way > to both "pretty print" and kill fast? My best guess right now is no ;-) But we could log nicely for the usual case of a small number of stray processes ... maybe something like this: i="" get_pids | tr '\n' ' ' | fold -s | while read procs; do if [ -z "$i" ]; then killnlog $procs i="nolog" else justkill $procs fi done Cheers, Dejan > -- > : Lars Ellenberg > : LINBIT | Keeping the Digital World Running > : DRBD -- Heartbeat -- Corosync -- Pacemaker > : R&D, Integration, Ops, Consulting, Support > > DRBD® and LINBIT® are registered trademarks of LINBIT > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org