Hi, On Thu, Jan 28, 2016 at 04:42:55PM +0900, yuta takeshita wrote: > Hi, > Sorry for replying late.
No problem. > 2016-01-15 21:19 GMT+09:00 Dejan Muhamedagic <deja...@fastmail.fm>: > > > Hi, > > > > On Fri, Jan 15, 2016 at 04:54:37PM +0900, yuta takeshita wrote: > > > Hi, > > > > > > Tanks for responding and making a patch. > > > > > > 2016-01-14 19:16 GMT+09:00 Dejan Muhamedagic <deja...@fastmail.fm>: > > > > > > > On Thu, Jan 14, 2016 at 11:04:09AM +0100, Dejan Muhamedagic wrote: > > > > > Hi, > > > > > > > > > > On Thu, Jan 14, 2016 at 04:20:19PM +0900, yuta takeshita wrote: > > > > > > Hello. > > > > > > > > > > > > I have been a problem with nfsserver RA on RHEL 7.1 and systemd. > > > > > > When the nfsd process is lost with unexpectly failure, > > > > nfsserver_monitor() > > > > > > doesn't detect it and doesn't execute failover. > > > > > > > > > > > > I use the below RA.(but this problem may be caused with latest > > > > nfsserver RA > > > > > > as well) > > > > > > > > > > > > https://github.com/ClusterLabs/resource-agents/blob/v3.9.6/heartbeat/nfsserver > > > > > > > > > > > > The cause is following. > > > > > > > > > > > > 1. After execute "pkill -9 nfsd", "systemctl status > > nfs-server.service" > > > > > > returns 0. > > > > > > > > > > I think that it should be systemctl is-active. Already had a > > > > > problem with systemctl status, well, not being what one would > > > > > assume status would be. Can you please test that and then open > > > > > either a pull request or issue at > > > > > https://github.com/ClusterLabs/resource-agents > > > > > > > > I already made a pull request: > > > > > > > > https://github.com/ClusterLabs/resource-agents/pull/741 > > > > > > > > Please test if you find time. > > > > > > > I tested the code, but still problems remain. > > > systemctl is-active retrun active and the return code is 0 as well as > > > systemctl status. > > > Perhaps it is inappropriate to use systemctl for monitoring the kernel > > > process. > > > > OK. My patch was too naive and didn't take into account the > > systemd/kernel intricacies. > > > > > Mr Kay Sievers who is a developer of systemd said that systemd doesn't > > > monitor kernel process in the following. > > > http://comments.gmane.org/gmane.comp.sysutils.systemd.devel/34367 > > > > Thanks for the reference. One interesting thing could also be > > reading /proc/fs/nfsd/threads instead of checking the process > > existence. Furthermore, we could do some RPC based monitor, but > > that would be, I guess, better suited for another monitor depth. > > > > OK. I survey and test the /proc/fs/nfsd/threads. > It seems to work well on my cluster. > I make a patch and a pull request. > https://github.com/ClusterLabs/resource-agents/pull/746 > > Please check if you have time. Some return codes of nfsserver_systemd_monitor() follow OCF and one apparently LSB: 301 nfs_exec is-active 302 rc=$? ... 311 if [ $threads_num -gt 0 ]; then 312 return $OCF_SUCCESS 313 else 314 return 3 315 fi 316 else 317 return $OCF_ERR_GENERIC ... 321 return $rc Given that nfs_exec() returns LSB codes, it should probably be something like this: 311 if [ $threads_num -gt 0 ]; then 312 return 0 313 else 314 return 3 315 fi 316 else 317 return 1 ... 321 return $rc It won't make any actual difference, but the intent would be cleaner (i.e. it's just by accident that the OCF codes are the same in this case). Cheers, Dejan > Regards, > Yuta > > > Cheers, > > > > Dejan > > > > > I reply to your pull request. > > > > > > Regards, > > > Yuta Takeshita > > > > > > > > > > > Thanks for reporting! > > > > > > > > Dejan > > > > > > > > > Thanks, > > > > > > > > > > Dejan > > > > > > > > > > > 2. nfsserver_monitor() judge with the return value of "systemctl > > status > > > > > > nfs-server.service". > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > # ps ax | grep nfsd > > > > > > 25193 ? S< 0:00 [nfsd4] > > > > > > 25194 ? S< 0:00 [nfsd4_callbacks] > > > > > > 25197 ? S 0:00 [nfsd] > > > > > > 25198 ? S 0:00 [nfsd] > > > > > > 25199 ? S 0:00 [nfsd] > > > > > > 25200 ? S 0:00 [nfsd] > > > > > > 25201 ? S 0:00 [nfsd] > > > > > > 25202 ? S 0:00 [nfsd] > > > > > > 25203 ? S 0:00 [nfsd] > > > > > > 25204 ? S 0:00 [nfsd] > > > > > > 25238 pts/0 S+ 0:00 grep --color=auto nfsd > > > > > > # > > > > > > # pkill -9 nfsd > > > > > > # > > > > > > # systemctl status nfs-server.service > > > > > > ● nfs-server.service - NFS server and services > > > > > > Loaded: loaded (/etc/systemd/system/nfs-server.service; > > disabled; > > > > vendor > > > > > > preset: disabled) > > > > > > Active: active (exited) since 木 2016-01-14 11:35:39 JST; 1min > > 3s ago > > > > > > Process: 25184 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS > > > > (code=exited, > > > > > > status=0/SUCCESS) > > > > > > Process: 25182 ExecStartPre=/usr/sbin/exportfs -r (code=exited, > > > > > > status=0/SUCCESS) > > > > > > Main PID: 25184 (code=exited, status=0/SUCCESS) > > > > > > CGroup: /system.slice/nfs-server.service > > > > > > (snip) > > > > > > # > > > > > > # echo $? > > > > > > 0 > > > > > > # > > > > > > # ps ax | grep nfsd > > > > > > 25256 pts/0 S+ 0:00 grep --color=auto nfsd > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > > > > It is because the nfsd process is kernel process, and systemd does > > not > > > > > > monitor the state of the kernel process of running. > > > > > > > > > > > > Is there something good way? > > > > > > (When I use "pidof" instead of "systemctl status", the faileover is > > > > > > successful.) > > > > > > > > > > > > Regards, > > > > > > Yuta Takeshita > > > > > > > > > > > _______________________________________________ > > > > > > Users mailing list: Users@clusterlabs.org > > > > > > http://clusterlabs.org/mailman/listinfo/users > > > > > > > > > > > > Project Home: http://www.clusterlabs.org > > > > > > Getting started: > > > > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > > > > Bugs: http://bugs.clusterlabs.org > > > > > > > > > > > > > > > _______________________________________________ > > > > > Users mailing list: Users@clusterlabs.org > > > > > http://clusterlabs.org/mailman/listinfo/users > > > > > > > > > > Project Home: http://www.clusterlabs.org > > > > > Getting started: > > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > > > Bugs: http://bugs.clusterlabs.org > > > > > > > > _______________________________________________ > > > > Users mailing list: Users@clusterlabs.org > > > > http://clusterlabs.org/mailman/listinfo/users > > > > > > > > Project Home: http://www.clusterlabs.org > > > > Getting started: > > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > > Bugs: http://bugs.clusterlabs.org > > > > > > > > > _______________________________________________ > > > Users mailing list: Users@clusterlabs.org > > > http://clusterlabs.org/mailman/listinfo/users > > > > > > Project Home: http://www.clusterlabs.org > > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > Bugs: http://bugs.clusterlabs.org > > > > > > _______________________________________________ > > Users mailing list: Users@clusterlabs.org > > http://clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org