Is there a way how I can get Pacemaker to repeat the stop of the resource if it failed?
Sincerely, Ark. e...@ethaniel.com On Sun, May 5, 2019 at 11:05 PM Andrei Borzenkov <arvidj...@gmail.com> wrote: > 05.05.2019 18:43, Arkadiy Kulev пишет: > > Dear Andrei, > > > > I'm sorry for the screenshot, this is the only thing that I have left > after > > the crash. > > > > What crash do you mean? All nodes appear up and running, you are able to > execute commands, I do not see anything crashed. > > > What would the best course of action be in this situation? > > Configure STONITH. It is mandatory so pacemaker can resolve such > situation among others. > > For now assuming node problems are over you should be able to clean > resource state (crm_resource --cleanup). Restarting pacemaker on all > nodes would also work. > > > We don't have a STONITH device. But the local network is still up (both > > nodes see each othes). > > > > Also, what does "(blocked)" means? > > > > It means that pacemaker cannot perform any action on this resource due > to failed prerequisites. In this case failed prerequisite was successful > stop of resource. > > > Sincerely, > > Ark. > > > > e...@ethaniel.com > > > > > > On Sun, May 5, 2019 at 9:46 PM Andrei Borzenkov <arvidj...@gmail.com> > wrote: > > > >> 05.05.2019 16:14, Arkadiy Kulev пишет: > >>> Hello! > >>> > >>> I run pacemaker on 2 active/active hosts which balance the load of 2 > >> public > >>> IP addresses. > >>> A few days ago we ran a very CPU/network intensive process on one of > the > >> 2 > >>> hosts and Pacemaker failed. > >>> > >>> I've attached a screenshot of the terminal to this email. > >>> > >>> The "Failed Actions" shows that the IPaddr2 "monitor_30000" failed with > >>> "unknown error" and a status of "Timed Out" (queue=0ms exec=0ms). The > >>> /etc/init.d LSB script (mycluster) failed as well (and set to blocked). > >>> > >>> This completely stalled Pacemaker and the second host didn't take over > >> the > >>> IP address and gateway settings. > >>> > >>> Any ideas would be appreciated. > >>> > >> > >> Stop operation failed, you have no stonith, so pacemaker cannot continue > >> and is stuck. > >> > >> > >>> > >>> [image: Screen Shot 2019-04-30 at 12.36.34.png] > >>> > >> > >> > >> Images are hard to reply to, consume excessive space and cannot be > >> viewed using text only clients. There is no reason to send image when > >> you can just copy and paste several lines of text. > >> _______________________________________________ > >> Manage your subscription: > >> https://lists.clusterlabs.org/mailman/listinfo/users > >> > >> ClusterLabs home: https://www.clusterlabs.org/ > > > > > > _______________________________________________ > > Manage your subscription: > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > ClusterLabs home: https://www.clusterlabs.org/ > > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/