Re: [ClusterLabs] Start resource only if another resource is stopped

Miro Igov Wed, 17 Aug 2022 06:59:02 -0700

As you guessed i am using crm res stop nfs_export_1. 
I tried the solution with attribute and it does not work correct.

When i stop nfs_export_1 it stops data_1 data_1_active, then it starts
data_2_failover - so far so good.

When i start nfs_export_1 it starts data_1, starts data_1_active and then
stops data_2_failover as result of order data_1_active_after_data_1 and
location data_2_failover_if_data_1_inactive.

But stopping data_2_failover unmounts the mount and end result is having no
NFS export mounted:

Aug 17 15:24:52 intranet-test1 pacemaker-fenced[1038]:  notice: Watchdog
will be used via SBD if fencing is required and stonith-watchdog-timeout is
nonzero
Aug 17 15:24:52 intranet-test1 Filesystem(data_1)[16382]: INFO: Running
start for nas-sync-test1:/home/pharmya/NAS on
/data/synology/pharmya_office/NAS_Sync/NAS
Aug 17 15:24:52 intranet-test1 Filesystem(data_1)[16382]: INFO: Filesystem
/data/synology/pharmya_office/NAS_Sync/NAS is already mounted.
Aug 17 15:24:52 intranet-test1 pacemaker-controld[1042]:  notice: Result of
start operation for data_1 on intranet-test1: 0 (ok)
Aug 17 15:24:52 intranet-test1 pacemaker-controld[1042]:  notice: Result of
start operation for data_1_active on intranet-test1: 0 (ok)
Aug 17 15:24:52 intranet-test1 pacemaker-attrd[1040]:  notice: Setting
opa-data_1_active[intranet-test1]: 0 -> 1
Aug 17 15:24:52 intranet-test1 pacemaker-controld[1042]:  notice: Result of
monitor operation for data_1_active on intranet-test1: 0 (ok)
Aug 17 15:24:52 intranet-test1 Filesystem(data_2_failover)[16456]: INFO:
Running stop for nas-sync-test2:/home/pharmya/NAS on
/data/synology/pharmya_office/NAS_Sync/NAS
Aug 17 15:24:52 intranet-test1 pacemaker-attrd[1040]:  notice: Setting
opa-data_2_active[intranet-test2]: 1 -> 0
Aug 17 15:24:52 intranet-test1 Filesystem(data_2_failover)[16456]: INFO:
Trying to unmount /data/synology/pharmya_office/NAS_Sync/NAS
Aug 17 15:24:52 intranet-test1 systemd[1]:
data-synology-pharmya_office-NAS_Sync-NAS.mount: Succeeded.
Aug 17 15:24:52 intranet-test1 systemd[11103]:
data-synology-pharmya_office-NAS_Sync-NAS.mount: Succeeded.
Aug 17 15:24:52 intranet-test1 Filesystem(data_2_failover)[16456]: INFO:
unmounted /data/synology/pharmya_office/NAS_Sync/NAS successfully
Aug 17 15:24:52 intranet-test1 pacemaker-controld[1042]:  notice: Result of
stop operation for data_2_failover on intranet-test1: 0 (ok)
Aug 17 15:24:52 intranet-test1 pacemaker-attrd[1040]:  notice: Setting
opa-data_2_active[intranet-test2]: 0 -> 1
Aug 17 15:25:42 intranet-test1 pacemaker-fenced[1038]:  notice: Watchdog
will be used via SBD if fencing is required and stonith-watchdog-timeout is
nonzero

On 11.08.2022 17:34, Miro Igov wrote:
> Hello,
> 
> I am trying to create failover resource that would start if another 
> resource is stopped and stop when the resource is started back.
> 
> It is 4 node cluster (with qdevice) where nodes are virtual machines 
> and two of them are hosted in a datacenter and the other 2 VMs in 
> another datacenter.
> 
> Names of the nodes are:
> 
> nas-sync-test1
> 
> intranet-test1
> 
> nas-sync-test2
> 
> intranet-test2
> 
> The nodes ending with 1 are hosted in same datacenter and ending in 2 
> are in the other datacenter.
> 
>  
> 
> nas-sync-test* nodes are running NFS servers and exports:
> 
> nfs_server_1, nfs_export_1 (running on nas-sync-test1)
> 
> nfs_server_2, nfs_export_2 (running on nas-sync-test2)
> 
>  
> 
> intranet-test1 is running NFS mount data_1 (mounting the 
> nfs_export_1),
> intranet-test2 is running data_2 (mounting nfs_export_2).
> 
> I created data_1_failover which is mounting the nfs_export_1 too and 
> would like to be running on intranet-test2 ONLY if data_2 is down. So 
> the idea is it mounts nfs_export_1 on intranet-test2 only when the 
> local mount data_2 is stopped (note the nfs_server_1 runs on one 
> datacenter and intranet-test2 in the another DC)
> 
> Also created data_2_failover with the same purpose as data_1_failover.
> 
>  
> 
> I would like to ask how to set the failover mounts automatically start 
> when ordinary mounts stop?
> 
>  
> 
> Current configuration of the constraints:
> 
>  
> 
> tag all_mounts data_1 data_2 data_1_failover data_2_failover
> 
> tag sync_1 nfs_server_1 nfs_export_1
> 
> tag sync_2 nfs_server_2 nfs_export_2
> 
> location deny_data_1 data_1 -inf: intranet-test2
> 
> location deny_data_2 data_2 -inf: intranet-test1
> 
> location deny_failover_1 data_1_failover -inf: intranet-test1
> 
> location deny_failover_2 data_2_failover -inf: intranet-test2
> 
> location deny_sync_1 sync_1 \
> 
>         rule -inf: #uname ne nas-sync-test1
> 
> location deny_sync_2 sync_2 \
> 
>         rule -inf: #uname ne nas-sync-test2
> 
> location mount_on_intranet all_mounts \
> 
>         rule -inf: #uname eq nas-sync-test1 or #uname eq 
> nas-sync-test2
> 
>  
> 
> colocation nfs_1 inf: nfs_export_1 nfs_server_1
> 
> colocation nfs_2 inf: nfs_export_2 nfs_server_2
> 
>  
> 
> order nfs_server_export_1 Mandatory: nfs_server_1 nfs_export_1
> 
> order nfs_server_export_2 Mandatory: nfs_server_2 nfs_export_2
> 
> order mount_1 Mandatory: nfs_export_1 data_1
> 
> order mount_1_failover Mandatory: nfs_export_1 data_1_failover
> 
> order mount_2 Mandatory: nfs_export_2 data_2
> 
> order mount_2_failover Mandatory: nfs_export_2 data_2_failover
> 
>  
> 
>  
> 
> I tried adding following colocation:
> 
>    colocation failover_1 -inf: data_2_failover data_1
> 

This colocation does not say "start data_2_failover when data_1 is stopped".
This colocation says "do not allocate data_2_failover to the same node where
data_1 is already allocated". There is difference between "resource A can
run on node N" and "resource A is active on node N".

> and it is stopping data_2_failover when data_1 is started, also it 
> starts data_2_failover when data_1 is stopped - exactly as needed!
> 
> Full List of Resources:
> 
>   * admin-ip    (ocf::heartbeat:IPaddr2):        Started intranet-test2
> 
>   * stonith-sbd (stonith:external/sbd):  Started intranet-test1
> 
>   * nfs_export_1        (ocf::heartbeat:exportfs):       Started
> nas-sync-test1
> 
>   * nfs_server_1        (systemd:nfs-server):    Started nas-sync-test1
> 
>   * nfs_export_2        (ocf::heartbeat:exportfs):       Started
> nas-sync-test2
> 
>   * nfs_server_2        (systemd:nfs-server):    Started nas-sync-test2
> 
>   * data_1_failover     (ocf::heartbeat:Filesystem):     Started
> intranet-test2
> 
>   * data_2_failover     (ocf::heartbeat:Filesystem):     Stopped
> 
>   * data_2      (ocf::heartbeat:Filesystem):     Started intranet-test2
> 
>   * data_1      (ocf::heartbeat:Filesystem):     Started intranet-test1
> 
>  
> 

For the future - it is much better to simply copy and paste actual commands
you used with their output. While we may guess that you used "crm resource
stop" or equivalent command, it is just a guess. Any conclusion based on
this guess will be wrong if we guessed wrong.

>  
> 
> Full List of Resources:
> 
>   * admin-ip    (ocf::heartbeat:IPaddr2):        Started intranet-test2
> 
>   * stonith-sbd (stonith:external/sbd):  Started intranet-test1
> 
>   * nfs_export_1        (ocf::heartbeat:exportfs):       Started
> nas-sync-test1
> 
>   * nfs_server_1        (systemd:nfs-server):    Started nas-sync-test1
> 
>   * nfs_export_2        (ocf::heartbeat:exportfs):       Started
> nas-sync-test2
> 
>   * nfs_server_2        (systemd:nfs-server):    Started nas-sync-test2
> 
>   * data_1_failover     (ocf::heartbeat:Filesystem):     Started
> intranet-test2
> 
>   * data_2_failover     (ocf::heartbeat:Filesystem):     Started
> intranet-test1
> 
>   * data_2      (ocf::heartbeat:Filesystem):     Started intranet-test2
> 
>   * data_1      (ocf::heartbeat:Filesystem):     Stopped (disabled)
> 

Assuming you used "crm resource stop data_1" - resource data_1 cannot run
anywhere now which allows pacemaker to allocate resource data_2_failover to
node intranet-test1.

>  
> 
>  
> 
> But it does not start data_2_failover when nfs_export_1 is stopped 
> which stops data_1:
> 
> Full List of Resources:
> 
>   * admin-ip    (ocf::heartbeat:IPaddr2):        Started intranet-test2
> 
>   * stonith-sbd (stonith:external/sbd):  Started intranet-test1
> 
>   * nfs_export_1        (ocf::heartbeat:exportfs):       Stopped
(disabled)
> 
>   * nfs_server_1        (systemd:nfs-server):    Started nas-sync-test1
> 
>   * nfs_export_2        (ocf::heartbeat:exportfs):       Started
> nas-sync-test2
> 
>   * nfs_server_2        (systemd:nfs-server):    Started nas-sync-test2
> 
>   * data_1_failover     (ocf::heartbeat:Filesystem):     Stopped
> 
>   * data_2_failover     (ocf::heartbeat:Filesystem):     Stopped
> 
>   * data_2      (ocf::heartbeat:Filesystem):     Started intranet-test2
> 
>   * data_1      (ocf::heartbeat:Filesystem):     Stopped
> 

And here there is no restriction for *placement* of data_1 which means
pacemaker allocated data_1 to the node intranet-test1. This resource is not
*active* due to ordering requirements - it cannot be started before another
resource is started - still, it is assigned to the cluster node and
colocation prohibits assignment of data_2_failover to the same node.
Pacemaker will wait (infinitely) for possibility to start data_1 on the
allocated node.

One possibility to do what you want is node attribute. Either resource agent
can set unique node attribute when resource becomes active or you can use
ocf:pacemaker:attribute. As a proof of concept:

primitive data_1_active ocf:pacemaker:attribute \
        params active_value=1 inactive_value=0 \
        op monitor interval=10s timeout=20s colocation attribute_1 inf:
data_1_active data_1 order data_1_active_after_data_1 Mandatory: data_1
data_1_active location data_2_failover_if_data_1_inactive data_2_failover \
        rule -inf: defined opa-data_1_active and opa-data_1_active eq 1
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

-- 
This message has been sent as a part of discussion between PHARMYA and the
addressee whose name is specified above. Should you receive this message by
mistake, we would be most grateful if you informed us that the message has
been sent to you. In this case, we also ask that you delete this message
from your mailbox, and do not forward it or any part of it to anyone else.
Thank you for your cooperation and understanding.
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Start resource only if another resource is stopped

Reply via email to