On 09/05/2016 03:02 PM, Gabriele Bulfon wrote: > I read docs, looks like sbd fencing is more about iscsi/fc exposed > storage resources. > Here I have real shared disks (seen from solaris with the format > utility as normal sas disks, but on both nodes). > They are all jbod disks, that ZFS organizes in raidz/mirror pools, so > I have 5 disks on one pool in one node, and the other 5 disks on > another pool in one node. > How can sbd work in this situation? Has it already been used/tested on > a Solaris env with ZFS ?
You wouldn't have to have discs at all with sbd. You can just use it for pacemaker to be monitored by a hardware-watchdog. But if you want to add discs it shouldn't really matter how they are accessed as long as you can concurrently read/write the block-devices. Configuration of caching in the controllers might be an issue as well. I'm e.g. currently testing with a simple kvm setup using following virsh-config for the shared block-device: <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source file='SHARED_IMAGE_FILE'/> <target dev='vdb' bus='virtio'/> <shareable/> <address type='pci' domain='0x0000' bus='0x00' slot='0x15' function='0x0'/> </disk> Don't know about test-coverage for sbd on Solaris. Actually it should be independent of which file-system you are using as you would anyway use a partition without filesystem for sbd. > > BTW, is there any other possibility other than sbd. > Probably - see Kens' suggestions. Excuse me thinking a little unidimensional at the moment working on some sbd-issue ;-) And not having a proper fencing-device a watchdog is the last resort to have something working reliably. And pacemakers' way to do watchdog is sbd... > Last but not least, is there any way to let ssh-fencing be considered > good? > At the moment, with ssh-fencing, if I shut down the second node, I get > all second resources in UNCLEAN state, not taken by the first one. > If I reboot the second , I only get the node on again, but resources > remain stopped. Strange... What do the logs say about the fencing-action being successful or not? > > I remember my tests with heartbeat react different (halt would move > everything to node1 and get back everything on restart) > > Gabriele > > ---------------------------------------------------------------------------------------- > *Sonicle S.r.l. *: http://www.sonicle.com <http://www.sonicle.com/> > *Music: *http://www.gabrielebulfon.com <http://www.gabrielebulfon.com/> > *Quantum Mechanics : *http://www.cdbaby.com/cd/gabrielebulfon > > > > ---------------------------------------------------------------------------------- > > Da: Klaus Wenninger <kwenn...@redhat.com> > A: users@clusterlabs.org > Data: 5 settembre 2016 12.21.25 CEST > Oggetto: Re: [ClusterLabs] ip clustering strange behaviour > > On 09/05/2016 11:20 AM, Gabriele Bulfon wrote: > > The dual machine is equipped with a syncro controller LSI 3008 > MPT SAS3. > > Both nodes can see the same jbod disks (10 at the moment, up to 24). > > Systems are XStreamOS / illumos, with ZFS. > > Each system has one ZFS pool of 5 disks, with different pool names > > (data1, data2). > > When in active / active, the two machines run different zones and > > services on their pools, on their networks. > > I have custom resource agents (tested on pacemaker/heartbeat, now > > porting to pacemaker/corosync) for ZFS pools and zones migration. > > When I was testing pacemaker/heartbeat, when ssh-fencing discovered > > the other node to be down (cleanly or abrupt halt), it was > > automatically using IPaddr and our ZFS agents to take control of > > everything, mounting the other pool and running any configured > zone in it. > > I would like to do the same with pacemaker/corosync. > > The two nodes of the dual machine have an inernal lan connecting > them, > > a 100Mb ethernet: maybe this is enough reliable to trust > ssh-fencing? > > Or is there anything I can do to ensure at the controller level that > > the pool is not in use on the other node? > > It is not just about the reliability of the networking-connection why > ssh-fencing might be > suboptimal. Something with the IP-Stack config (dynamic due to moving > resources) > might have gone wrong. And resources might be somehow hanging so that > the node > can't be brought down gracefully. Thus my suggestion to add a watchdog > (so available) > via sbd. > > > > > Gabriele > > > > > > ---------------------------------------------------------------------------------------- > > *Sonicle S.r.l. *: http://www.sonicle.com <http://www.sonicle.com/> > > *Music: *http://www.gabrielebulfon.com > <http://www.gabrielebulfon.com/> > > *Quantum Mechanics : *http://www.cdbaby.com/cd/gabrielebulfon > > > > > > > > > > ---------------------------------------------------------------------------------- > > > > Da: Ken Gaillot <kgail...@redhat.com> > > A: gbul...@sonicle.com Cluster Labs - All topics related to > > open-source clustering welcomed <users@clusterlabs.org> > > Data: 1 settembre 2016 15.49.04 CEST > > Oggetto: Re: [ClusterLabs] ip clustering strange behaviour > > > > On 08/31/2016 11:50 PM, Gabriele Bulfon wrote: > > > Thanks, got it. > > > So, is it better to use "two_node: 1" or, as suggested else > > where, or > > > "no-quorum-policy=stop"? > > > > I'd prefer "two_node: 1" and letting pacemaker's options > default. But > > see the votequorum(5) man page for what two_node implies -- most > > importantly, both nodes have to be available when the cluster starts > > before it will start any resources. Node failure is handled fine > once > > the cluster has started, but at start time, both nodes must be up. > > > > > About fencing, the machine I'm going to implement the 2-nodes > > cluster is > > > a dual machine with shared disks backend. > > > Each node has two 10Gb ethernets dedicated to the public ip > and the > > > admin console. > > > Then there is a third 100Mb ethernet connecing the two machines > > internally. > > > I was going to use this last one as fencing via ssh, but looks > > like this > > > way I'm not gonna have ip/pool/zone movements if one of the nodes > > > freezes or halts without shutting down pacemaker clean. > > > What should I use instead? > > > > I'm guessing as a dual machine, they share a power supply, so that > > rules > > out a power switch. If the box has IPMI that can individually power > > cycle each host, you can use fence_ipmilan. If the disks are > > shared via > > iSCSI, you could use fence_scsi. If the box has a hardware watchdog > > device that can individually target the hosts, you could use sbd. If > > none of those is an option, probably the best you could do is > run the > > cluster nodes as VMs on each host, and use fence_xvm. > > > > > Thanks for your help, > > > Gabriele > > > > > > > > > > ---------------------------------------------------------------------------------------- > > > *Sonicle S.r.l. *: http://www.sonicle.com > <http://www.sonicle.com/> > > > *Music: *http://www.gabrielebulfon.com > > <http://www.gabrielebulfon.com/> > > > *Quantum Mechanics : *http://www.cdbaby.com/cd/gabrielebulfon > > > > > > > > > > > > > > > > ---------------------------------------------------------------------------------- > > > > > > Da: Ken Gaillot <kgail...@redhat.com> > > > A: users@clusterlabs.org > > > Data: 31 agosto 2016 17.25.05 CEST > > > Oggetto: Re: [ClusterLabs] ip clustering strange behaviour > > > > > > On 08/30/2016 01:52 AM, Gabriele Bulfon wrote: > > > > Sorry for reiterating, but my main question was: > > > > > > > > why does node 1 removes its own IP if I shut down node 2 > abruptly? > > > > I understand that it does not take the node 2 IP (because the > > > > ssh-fencing has no clue about what happened on the 2nd node), > > but I > > > > wouldn't expect it to shut down its own IP...this would kill any > > > service > > > > on both nodes...what am I wrong? > > > > > > Assuming you're using corosync 2, be sure you have "two_node: > 1" in > > > corosync.conf. That will tell corosync to pretend there is always > > > quorum, so pacemaker doesn't need any special quorum settings. > > See the > > > votequorum(5) man page for details. Of course, you need fencing > > in this > > > setup, to handle when communication between the nodes is broken > > but both > > > are still up. > > > > > > > > > > > > > > ---------------------------------------------------------------------------------------- > > > > *Sonicle S.r.l. *: http://www.sonicle.com > > <http://www.sonicle.com/> > > > > *Music: *http://www.gabrielebulfon.com > > > <http://www.gabrielebulfon.com/> > > > > *Quantum Mechanics : *http://www.cdbaby.com/cd/gabrielebulfon > > > > > > > > > > > > > > ------------------------------------------------------------------------ > > > > > > > > > > > > *Da:* Gabriele Bulfon <gbul...@sonicle.com> > > > > *A:* kwenn...@redhat.com Cluster Labs - All topics related to > > > > open-source clustering welcomed <users@clusterlabs.org> > > > > *Data:* 29 agosto 2016 17.37.36 CEST > > > > *Oggetto:* Re: [ClusterLabs] ip clustering strange behaviour > > > > > > > > > > > > Ok, got it, I hadn't gracefully shut pacemaker on node2. > > > > Now I restarted, everything was up, stopped pacemaker service on > > > > host2 and I got host1 with both IPs configured. ;) > > > > > > > > But, though I understand that if I halt host2 with no grace > > shut of > > > > pacemaker, it will not move the IP2 to Host1, I don't expect > host1 > > > > to loose its own IP! Why? > > > > > > > > Gabriele > > > > > > > > > > > > > > > ---------------------------------------------------------------------------------------- > > > > *Sonicle S.r.l. *: http://www.sonicle.com > > <http://www.sonicle.com/> > > > > *Music: *http://www.gabrielebulfon.com > > > <http://www.gabrielebulfon.com/> > > > > *Quantum Mechanics : *http://www.cdbaby.com/cd/gabrielebulfon > > > > > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------------------- > > > > > > > > Da: Klaus Wenninger <kwenn...@redhat.com> > > > > A: users@clusterlabs.org > > > > Data: 29 agosto 2016 17.26.49 CEST > > > > Oggetto: Re: [ClusterLabs] ip clustering strange behaviour > > > > > > > > On 08/29/2016 05:18 PM, Gabriele Bulfon wrote: > > > > > Hi, > > > > > > > > > > now that I have IPaddr work, I have a strange behaviour on > > my test > > > > > setup of 2 nodes, here is my configuration: > > > > > > > > > > ===STONITH/FENCING=== > > > > > > > > > > primitive xstorage1-stonith stonith:external/ssh-sonicle op > > > > monitor > > > > > interval="25" timeout="25" start-delay="25" params > > > > hostlist="xstorage1" > > > > > > > > > > primitive xstorage2-stonith stonith:external/ssh-sonicle op > > > > monitor > > > > > interval="25" timeout="25" start-delay="25" params > > > > hostlist="xstorage2" > > > > > > > > > > location xstorage1-stonith-pref xstorage1-stonith -inf: > > xstorage1 > > > > > location xstorage2-stonith-pref xstorage2-stonith -inf: > > xstorage2 > > > > > > > > > > property stonith-action=poweroff > > > > > > > > > > > > > > > > > > > > ===IP RESOURCES=== > > > > > > > > > > > > > > > primitive xstorage1_wan1_IP ocf:heartbeat:IPaddr params > > > > ip="1.2.3.4" > > > > > cidr_netmask="255.255.255.0" nic="e1000g1" > > > > > primitive xstorage2_wan2_IP ocf:heartbeat:IPaddr params > > > > ip="1.2.3.5" > > > > > cidr_netmask="255.255.255.0" nic="e1000g1" > > > > > > > > > > location xstorage1_wan1_IP_pref xstorage1_wan1_IP 100: > xstorage1 > > > > > location xstorage2_wan2_IP_pref xstorage2_wan2_IP 100: > xstorage2 > > > > > > > > > > =================== > > > > > > > > > > So I plumbed e1000g1 with unconfigured IP on both machines and > > > > started > > > > > corosync/pacemaker, and after some time I got all nodes > > online and > > > > > started, with IP configured as virtual interfaces > (e1000g1:1 and > > > > > e1000g1:2) one in host1 and one in host2. > > > > > > > > > > Then I halted host2, and I expected to have host1 started with > > > > both > > > > > IPs configured on host1. > > > > > Instead, I got host1 started with the IP stopped and removed > > (only > > > > > e1000g1 unconfigured), host2 stopped saying IP started (!?). > > > > > Not exactly what I expected... > > > > > What's wrong? > > > > > > > > How did you stop host2? Graceful shutdown of pacemaker? If > not ... > > > > Anyway ssh-fencing is just working if the machine is still > > > > running ... > > > > So it will stay unclean and thus pacemaker is thinking that > > > > the IP might still be running on it. So this is actually the > > > > expected > > > > behavior. > > > > You might add a watchdog via sbd if you don't have other fencing > > > > hardware at hand ... > > > > > > > > > > Here is the crm status after I stopped host 2: > > > > > > > > > > 2 nodes and 4 resources configured > > > > > > > > > > Node xstorage2: UNCLEAN (offline) > > > > > Online: [ xstorage1 ] > > > > > > > > > > Full list of resources: > > > > > > > > > > xstorage1-stonith (stonith:external/ssh-sonicle): Started > > > > xstorage2 > > > > > (UNCLEAN) > > > > > xstorage2-stonith (stonith:external/ssh-sonicle): Stopped > > > > > xstorage1_wan1_IP (ocf::heartbeat:IPaddr): Stopped > > > > > xstorage2_wan2_IP (ocf::heartbeat:IPaddr): Started xstorage2 > > > > (UNCLEAN) > > > > > > > > > > > > > > > Gabriele > > > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------------------------- > > > > > *Sonicle S.r.l. *: http://www.sonicle.com > > > > <http://www.sonicle.com/> > > > > > *Music: *http://www.gabrielebulfon.com > > > > <http://www.gabrielebulfon.com/> > > > > > *Quantum Mechanics : *http://www.cdbaby.com/cd/gabrielebulfon > > > > > > > > > > > > _______________________________________________ > > Users mailing list: Users@clusterlabs.org > > http://clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > >
_______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org