Happy to help you understand, just keep asking questions. :)

The point can be explained this way;

* If two nodes can work without coordination, you don't need a cluster, just run your services everywhere. If that is not the case, then you require coordination. Fencing ensures that a node that has entered an unknown state can be forced into a known state (off). In this way, no action will be taken by a node unless the peer can be informed, or the peer is gone.

The method that a node is forced into a known state depends on the hardware (or infrastructure) you have in your particular setup. So perhaps, explain what you're nodes are built on and we can assist with more specific details.

digimer

On 2019-04-17 5:46 p.m., JCA wrote:
Thanks. This implies that I officially do not understand what it is that fencing can do for me, in my simple cluster. Back to the drawing board.

On Wed, Apr 17, 2019 at 3:33 PM digimer <li...@alteeve.ca <mailto:li...@alteeve.ca>> wrote:

    Fencing requires some mechanism, outside the nodes themselves,
    that can terminate the nodes. Typically, IPMI (iLO, iRMC, RSA,
    DRAC, etc) is used for this. Alternatively, switched PDUs are
    common. If you don't have these but do have a watchdog timer on
    your nodes, SBD (storage-based death) can work.

    You can use 'fence_<device> <options> -o status' at the command
    line to figure out the what will work with your hardware. Once you
    can called 'fence_foo ... -o status' and get the status of each
    node, then translating that into a pacemaker configuration is
    pretty simple. That's when you enable stonith.

    Once stonith is setup and working in pacemaker (ie: you can crash
    a node and the peer reboots it), then you will go to DRBD and set
    'fencing: resource-and-stonith;' (tells DRBD to block on
    communication failure with the peer and request a fence), and then
    setup the 'fence-handler /path/to/crm-fence-peer.sh' and
    'unfence-handler /path/to/crm-unfence-handler.sh' (I am going from
    memory, check the man page to verify syntax).

    With all this done; if either pacemaker/corosync or DRBD lose
    contact with the peer, they will block and fence. Only after the
    peer has been confirmed terminated will IO resume. This way,
    split-nodes become effectively impossible.

    digimer

    On 2019-04-17 5:17 p.m., JCA wrote:
    Here is what I did:

    # pcs stonith create disk_fencing fence_scsi pcmk_host_list="one
    two" pcmk_monitor_action="metadata" pcmk_reboot_action="off"
    devices="/dev/disk/by-id/ata-VBOX_HARDDISK_VBaaa429e4-514e8ecb"
    meta provides="unfencing"

    where ata-VBOX-... corresponds to the device where I have the
    partition that is shared between both nodes in my cluster. The
    command completes without any errors (that I can see) and after
    that I have

    # pcs status
    Cluster name: ClusterOne
    Stack: corosync
    Current DC: one (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition
    with quorum
    Last updated: Wed Apr 17 14:35:25 2019
    Last change: Wed Apr 17 14:11:14 2019 by root via cibadmin on one

    2 nodes configured
    5 resources configured

    Online: [ one two ]

    Full list of resources:

     MyCluster(ocf::myapp:myapp-script):Stopped
     Master/Slave Set: DrbdDataClone [DrbdData]
         Stopped: [ one two ]
     DrbdFS(ocf::heartbeat:Filesystem):Stopped
     disk_fencing (stonith:fence_scsi):Stopped

    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled

    Things stay that way indefinitely, until I set stonith-enabled to
    false - at which point all the resources above get started
    immediately.

    Obviously, I am missing something big here. But, what is it?


    On Wed, Apr 17, 2019 at 2:59 PM Adam Budziński
    <budzinski.a...@gmail.com <mailto:budzinski.a...@gmail.com>> wrote:

        You did not configure any fencing device.

        śr., 17.04.2019, 22:51 użytkownik JCA <1.41...@gmail.com
        <mailto:1.41...@gmail.com>> napisał:

            I am trying to get fencing working, as described in the
            "Cluster from Scratch" guide, and I am stymied at get-go :-(

            The document mentions a property named stonith-enabled.
            When I was trying to get my first cluster going, I
            noticed that my resources would start only when this
            property is set to false, by means of

                # pcs property set stonith-enabled=false

            Otherwise, all the resources remain stopped.

            I created a fencing resource for the partition that I am
            sharing across the the nodes, by means of DRBD. This
            works fine - but I still have the same problem as above -
            i.e. when stonith-enabled is set to true, all the
            resources get stopped, and remain in that state.

            I am very confused here. Can anybody point me in the
            right direction out of this conundrum?



            _______________________________________________
            Manage your subscription:
            https://lists.clusterlabs.org/mailman/listinfo/users

            ClusterLabs home: https://www.clusterlabs.org/

        _______________________________________________
        Manage your subscription:
        https://lists.clusterlabs.org/mailman/listinfo/users

        ClusterLabs home: https://www.clusterlabs.org/


    _______________________________________________
    Manage your subscription:
    https://lists.clusterlabs.org/mailman/listinfo/users

    ClusterLabs home:https://www.clusterlabs.org/

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to