06.09.2018 17:36, Patrick Whitney пишет: > Good Morning Everyone, > > I'm hoping someone with more experience with corosync and pacemaker can see > what I am doing wrong. > > I've got a test setup of 2 nodes, with dlm and clvm setup as clones, and > using fence_scsi as my fencing agent. > > I've got it to the point where the cluster is up, and reports it is happy. > I then began testing fencing. When issuing 'pcs stonith fence' it appears > to work; that is, the scsi reservation is pulled and the output of 'pcs > status' looks sane, and I'm able to access resources on the un-fenced node. > > Things go awry when I shutdown (init 0) the fenced node... my unfenced node > decides to fence itself (which looks like it was initiated by dlm due to an > abandoned lockspace). > > I suspect this is due to misconfiguration, since I'm new to the toolset, > but I'm not quite sure what I need to change. > > Any and all input is appreciated! > > Below is a chronology of events; my corosync config and cib.xml; command > output; and annotated logs. > > Again, any hints, suggestions, wild guesses, or premonitions are welcomed > -- I'm stuck! Please let me know if there is additional information which > would be helpful. > > Many thanks, > -Patrick W. > > Sep 6 08:54:14 -- Cluster is up and running; UI reports everything > healthy. > > Sep 6 08:55:44 -- 'pcs stonith fence' called against node 1 > (coro-test-1); > UI reports everything as expected -- that > is, resources show only running on unfenced node and they're available. > Oddly, although the UI says dlm is stopped > on fenced node, the dlm_controld is still running. > > Sep 6 09:03:38 -- node 1 is shutdown, and node 2 falls to pieces. > - First, corosync sees lost member -- seems > like this is appropriate, to me. > - Next, dlm_controld calls to fence > everything > - stonith-ng tries to fence node 1 (but its > already fenced!) > - dlm closes connection to "node 2" (does > dlm "nodes" map to cluster nodes? I'm not sure they do) > - clvmd dlm lockspace is now abandoned; > cluster attempts to fence the remaining node > (But can't because scsi_fence doesn't work > like that). > > *** > ****** -- Configuration -- > *** > root@coro-test-2:~# pcs --version > 0.9.149 > root@coro-test-2:~# pacemakerd --version > Pacemaker 1.1.14
I wonder if https://github.com/ClusterLabs/pacemaker/pull/839 is relevant here. _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org