On Tue, May 18, 2021 at 8:20 AM Eric Robinson <eric.robin...@psmnv.com> wrote:
>
> Okay, here is a test, starting with the initial cluster status...
>
>
> [root@ha09a ~]# pcs status
> Cluster name: ha09ab
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: ha09a (version 2.0.4-6.el8_3.2-2deceaa3ae) - partition with 
> quorum
>   * Last updated: Mon May 17 22:14:11 2021
>   * Last change:  Mon May 17 21:58:18 2021 by hacluster via crmd on ha09b
>   * 2 nodes configured
>   * 8 resource instances configured
>
> Node List:
>   * Online: [ ha09a ha09b ]
>
> Full List of Resources:
>   * Clone Set: p_drbd0-clone [p_drbd0] (promotable):
>     * Masters: [ ha09a ]
>     * Slaves: [ ha09b ]
>   * Clone Set: p_drbd1-clone [p_drbd1] (promotable):
>     * Masters: [ ha09a ]
>     * Slaves: [ ha09b ]
>   * p_vdo0      (lsb:vdo0):      Started ha09a
>   * p_vdo1      (lsb:vdo1):      Started ha09a
>   * p_fs_clust08        (ocf::heartbeat:Filesystem):     Started ha09a
>   * p_fs_clust09        (ocf::heartbeat:Filesystem):     Started ha09a
>
> Failed Resource Actions:
>   * p_vdo0_monitor_15000 on ha09a 'not running' (7): call=35, 
> status='complete', exitreason='', last-rc-change='2021-05-17 21:01:28 
> -07:00', queued=0ms, exec=157ms
>   * p_vdo1_monitor_15000 on ha09a 'not running' (7): call=91, 
> status='complete', exitreason='', last-rc-change='2021-05-17 21:56:57 
> -07:00', queued=0ms, exec=164ms
>
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled
>
>
> Here are the constraints...
>
> [root@ha09a ~]# pcs constraint --full
> Location Constraints:
> Ordering Constraints:
>   promote p_drbd0-clone then start p_vdo0 (kind:Mandatory) 
> (id:order-p_drbd0-clone-p_vdo0-mandatory)
>   promote p_drbd1-clone then start p_vdo1 (kind:Mandatory) 
> (id:order-p_drbd1-clone-p_vdo1-mandatory)
>   start p_vdo0 then start p_fs_clust08 (kind:Mandatory) 
> (id:order-p_vdo0-p_fs_clust08-mandatory)
>   start p_vdo1 then start p_fs_clust09 (kind:Mandatory) 
> (id:order-p_vdo1-p_fs_clust09-mandatory)
> Colocation Constraints:
>   p_vdo0 with p_drbd0-clone (score:INFINITY) 
> (id:colocation-p_vdo0-p_drbd0-clone-INFINITY)
>   p_vdo1 with p_drbd1-clone (score:INFINITY) 
> (id:colocation-p_vdo1-p_drbd1-clone-INFINITY)

This is wrong. It says vdo can be active on every node where a clone
instance is active. You need colocation with master.

>   p_fs_clust08 with p_vdo0 (score:INFINITY) 
> (id:colocation-p_fs_clust08-p_vdo0-INFINITY)
>   p_fs_clust09 with p_vdo1 (score:INFINITY) 
> (id:colocation-p_fs_clust09-p_vdo1-INFINITY)
> Ticket Constraints:
>
> I will now try to move resource p_fs_clust08...
>
> [root@ha09a ~]# pcs resource move p_fs_clust08
> Warning: Creating location constraint 'cli-ban-p_fs_clust08-on-ha09a' with a 
> score of -INFINITY for resource p_fs_clust08 on ha09a.
>         This will prevent p_fs_clust08 from running on ha09a until the 
> constraint is removed
>         This will be the case even if ha09a is the last node in the cluster
> [root@ha09a ~]#
> [root@ha09a ~]#
>
> The resource fails to move and is now in a stopped state...
>
> [root@ha09a ~]# pcs status
> Cluster name: ha09ab
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: ha09a (version 2.0.4-6.el8_3.2-2deceaa3ae) - partition with 
> quorum
>   * Last updated: Mon May 17 22:17:16 2021
>   * Last change:  Mon May 17 22:16:51 2021 by root via crm_resource on ha09a
>   * 2 nodes configured
>   * 8 resource instances configured
>
> Node List:
>   * Online: [ ha09a ha09b ]
>
> Full List of Resources:
>   * Clone Set: p_drbd0-clone [p_drbd0] (promotable):
>     * Masters: [ ha09b ]
>     * Slaves: [ ha09a ]
>   * Clone Set: p_drbd1-clone [p_drbd1] (promotable):
>     * Masters: [ ha09a ]
>     * Slaves: [ ha09b ]
>   * p_vdo0      (lsb:vdo0):      Started ha09b
>   * p_vdo1      (lsb:vdo1):      Started ha09a
>   * p_fs_clust08        (ocf::heartbeat:Filesystem):     Stopped
>   * p_fs_clust09        (ocf::heartbeat:Filesystem):     Started ha09a
>
> Failed Resource Actions:
>   * p_vdo0_monitor_15000 on ha09a 'not running' (7): call=35, 
> status='complete', exitreason='', last-rc-change='2021-05-17 21:01:28 
> -07:00', queued=0ms, exec=157ms
>   * p_vdo1_monitor_15000 on ha09a 'not running' (7): call=91, 
> status='complete', exitreason='', last-rc-change='2021-05-17 21:56:57 
> -07:00', queued=0ms, exec=164ms
>   * p_vdo0_monitor_15000 on ha09b 'not running' (7): call=35, 
> status='complete', exitreason='', last-rc-change='2021-05-17 22:16:53 
> -07:00', queued=0ms, exec=170ms
>   * p_fs_clust08_start_0 on ha09b 'not installed' (5): call=36, 
> status='complete', exitreason='Couldn't find device [/dev/mapper/vdo0]. 
> Expected /dev/??? to exist', last-rc-change='2021-05-17 22:16:53 -07:00', 
> queued=0ms, exec=330ms
>
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled
>
> Here are the logs from ha09a...
>
> May 17 22:16:51 ha09a pacemaker-controld[2657]: notice: State transition 
> S_IDLE -> S_POLICY_ENGINE
> May 17 22:16:51 ha09a pacemaker-schedulerd[2656]: notice: On loss of quorum: 
> Ignore
> May 17 22:16:51 ha09a pacemaker-schedulerd[2656]: warning: Unexpected result 
> (not running) was recorded for monitor of p_vdo0 on ha09a at May 17 21:01:28 
> 2021
> May 17 22:16:51 ha09a pacemaker-schedulerd[2656]: warning: Unexpected result 
> (not running) was recorded for monitor of p_vdo1 on ha09a at May 17 21:56:57 
> 2021
> May 17 22:16:51 ha09a pacemaker-schedulerd[2656]: notice:  * Move       
> p_vdo0           (        ha09a -> ha09b )
> May 17 22:16:51 ha09a pacemaker-schedulerd[2656]: notice:  * Move       
> p_fs_clust08     (        ha09a -> ha09b )

As you see, pacemaker tries to move resources to node with secondary
DRBD instance instead of promoting DRBD first.

> May 17 22:16:51 ha09a pacemaker-schedulerd[2656]: notice: Calculated 
> transition 24, saving inputs in /var/lib/pacemaker/pengine/pe-input-459.bz2
> May 17 22:16:51 ha09a pacemaker-controld[2657]: notice: Initiating stop 
> operation p_fs_clust08_stop_0 locally on ha09a
> May 17 22:16:51 ha09a Filesystem(p_fs_clust08)[50520]: INFO: Running stop for 
> /dev/mapper/vdo0 on /ha01_mysql
> May 17 22:16:51 ha09a Filesystem(p_fs_clust08)[50520]: INFO: Trying to 
> unmount /ha01_mysql
> May 17 22:16:51 ha09a systemd[1611]: ha01_mysql.mount: Succeeded.
> May 17 22:16:51 ha09a systemd[2582]: ha01_mysql.mount: Succeeded.
> May 17 22:16:51 ha09a systemd[1]: ha01_mysql.mount: Succeeded.
> May 17 22:16:51 ha09a kernel: XFS (dm-5): Unmounting Filesystem
> May 17 22:16:51 ha09a Filesystem(p_fs_clust08)[50520]: INFO: unmounted 
> /ha01_mysql successfully
> May 17 22:16:51 ha09a pacemaker-controld[2657]: notice: Result of stop 
> operation for p_fs_clust08 on ha09a: ok
> May 17 22:16:51 ha09a pacemaker-controld[2657]: notice: Initiating stop 
> operation p_vdo0_stop_0 locally on ha09a
> May 17 22:16:52 ha09a lvm[4241]: No longer monitoring VDO pool vdo0.
> May 17 22:16:52 ha09a UDS/vdodmeventd[50696]: INFO   (vdodmeventd/50696) VDO 
> device vdo0 is now unregistered from dmeventd
> May 17 22:16:52 ha09a kernel: kvdo3:dmsetup: suspending device 'vdo0'
> May 17 22:16:52 ha09a kernel: kvdo3:packerQ: compression is disabled
> May 17 22:16:52 ha09a kernel: kvdo3:packerQ: compression is enabled
> May 17 22:16:52 ha09a kernel: uds: dmsetup: beginning save (vcn 85)
> May 17 22:16:52 ha09a kernel: uds: dmsetup: finished save (vcn 85)
> May 17 22:16:52 ha09a kernel: kvdo3:dmsetup: device 'vdo0' suspended
> May 17 22:16:52 ha09a kernel: kvdo3:dmsetup: stopping device 'vdo0'
> May 17 22:16:52 ha09a kernel: kvdo3:dmsetup: device 'vdo0' stopped
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Result of stop 
> operation for p_vdo0 on ha09a: ok
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Initiating start 
> operation p_vdo0_start_0 on ha09b
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Initiating monitor 
> operation p_vdo0_monitor_15000 on ha09b
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Initiating start 
> operation p_fs_clust08_start_0 on ha09b
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Transition 24 aborted 
> by operation p_vdo0_monitor_15000 'create' on ha09b: Event failed
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Transition 24 action 
> 69 (p_vdo0_monitor_15000 on ha09b): expected 'ok' but got 'not running'
> May 17 22:16:53 ha09a pacemaker-attrd[2655]: notice: Setting 
> fail-count-p_vdo0#monitor_15000[ha09b]: (unset) -> 1
> May 17 22:16:53 ha09a pacemaker-attrd[2655]: notice: Setting 
> last-failure-p_vdo0#monitor_15000[ha09b]: (unset) -> 1621315013
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Transition 24 aborted 
> by status-2-fail-count-p_vdo0.monitor_15000 doing create 
> fail-count-p_vdo0#monitor_15000=1: Transient attribute change
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Transition 24 action 
> 73 (p_fs_clust08_start_0 on ha09b): expected 'ok' but got 'not installed'
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Transition 24 
> (Complete=5, Pending=0, Fired=0, Skipped=0, Incomplete=1, 
> Source=/var/lib/pacemaker/pengine/pe-input-459.bz2): Complete
> May 17 22:16:53 ha09a pacemaker-attrd[2655]: notice: Setting 
> fail-count-p_fs_clust08#start_0[ha09b]: (unset) -> INFINITY
> May 17 22:16:53 ha09a pacemaker-attrd[2655]: notice: Setting 
> last-failure-p_fs_clust08#start_0[ha09b]: (unset) -> 1621315013
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice: On loss of quorum: 
> Ignore
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: warning: Unexpected result 
> (not running) was recorded for monitor of p_vdo0 on ha09a at May 17 21:01:28 
> 2021
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: warning: Unexpected result 
> (not running) was recorded for monitor of p_vdo1 on ha09a at May 17 21:56:57 
> 2021
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: warning: Unexpected result 
> (not running) was recorded for monitor of p_vdo0 on ha09b at May 17 22:16:53 
> 2021
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: warning: Unexpected result 
> (not installed: Couldn't find device [/dev/mapper/vdo0]. Expected /dev/??? to 
> exist) was recorded for start of p_fs_clust08 on ha09b at May 17 22:16:53 2021
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice: Preventing 
> p_fs_clust08 from restarting on ha09b because of hard failure (not installed: 
> Couldn't find device [/dev/mapper/vdo0]. Expected /dev/??? to exist)
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: warning: Unexpected result 
> (not installed: Couldn't find device [/dev/mapper/vdo0]. Expected /dev/??? to 
> exist) was recorded for start of p_fs_clust08 on ha09b at May 17 22:16:53 2021
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice: Preventing 
> p_fs_clust08 from restarting on ha09b because of hard failure (not installed: 
> Couldn't find device [/dev/mapper/vdo0]. Expected /dev/??? to exist)
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice:  * Demote     
> p_drbd0:0        ( Master -> Slave ha09a )
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice:  * Promote    
> p_drbd0:1        ( Slave -> Master ha09b )
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice:  * Recover    
> p_vdo0           (                 ha09b )
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice:  * Stop       
> p_fs_clust08     (                 ha09b )   due to node availability
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice: Calculated 
> transition 25, saving inputs in /var/lib/pacemaker/pengine/pe-input-460.bz2
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice: On loss of quorum: 
> Ignore
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: warning: Unexpected result 
> (not running) was recorded for monitor of p_vdo0 on ha09a at May 17 21:01:28 
> 2021
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: warning: Unexpected result 
> (not running) was recorded for monitor of p_vdo1 on ha09a at May 17 21:56:57 
> 2021
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: warning: Unexpected result 
> (not running) was recorded for monitor of p_vdo0 on ha09b at May 17 22:16:53 
> 2021
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: warning: Unexpected result 
> (not installed: Couldn't find device [/dev/mapper/vdo0]. Expected /dev/??? to 
> exist) was recorded for start of p_fs_clust08 on ha09b at May 17 22:16:53 2021
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice: Preventing 
> p_fs_clust08 from restarting on ha09b because of hard failure (not installed: 
> Couldn't find device [/dev/mapper/vdo0]. Expected /dev/??? to exist)
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: warning: Unexpected result 
> (not installed: Couldn't find device [/dev/mapper/vdo0]. Expected /dev/??? to 
> exist) was recorded for start of p_fs_clust08 on ha09b at May 17 22:16:53 2021
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice: Preventing 
> p_fs_clust08 from restarting on ha09b because of hard failure (not installed: 
> Couldn't find device [/dev/mapper/vdo0]. Expected /dev/??? to exist)
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: warning: Forcing 
> p_fs_clust08 away from ha09b after 1000000 failures (max=1000000)
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice:  * Demote     
> p_drbd0:0        ( Master -> Slave ha09a )
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice:  * Promote    
> p_drbd0:1        ( Slave -> Master ha09b )
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice:  * Recover    
> p_vdo0           (                 ha09b )
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice:  * Stop       
> p_fs_clust08     (                 ha09b )   due to node availability
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice: Calculated 
> transition 26, saving inputs in /var/lib/pacemaker/pengine/pe-input-461.bz2
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Initiating cancel 
> operation p_drbd0_monitor_60000 on ha09b
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Initiating stop 
> operation p_fs_clust08_stop_0 on ha09b
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Initiating notify 
> operation p_drbd0_pre_notify_demote_0 locally on ha09a
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Initiating notify 
> operation p_drbd0_pre_notify_demote_0 on ha09b
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Result of notify 
> operation for p_drbd0 on ha09a: ok
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Initiating stop 
> operation p_vdo0_stop_0 on ha09b
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Initiating demote 
> operation p_drbd0_demote_0 locally on ha09a
> May 17 22:16:54 ha09a kernel: drbd ha01_mysql: role( Primary -> Secondary )
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Result of demote 
> operation for p_drbd0 on ha09a: ok
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Initiating notify 
> operation p_drbd0_post_notify_demote_0 locally on ha09a
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Initiating notify 
> operation p_drbd0_post_notify_demote_0 on ha09b
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Result of notify 
> operation for p_drbd0 on ha09a: ok
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Initiating notify 
> operation p_drbd0_pre_notify_promote_0 locally on ha09a
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Initiating notify 
> operation p_drbd0_pre_notify_promote_0 on ha09b
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Result of notify 
> operation for p_drbd0 on ha09a: ok
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Initiating promote 
> operation p_drbd0_promote_0 on ha09b
> May 17 22:16:54 ha09a kernel: drbd ha01_mysql ha09b: Preparing remote state 
> change 610633182
> May 17 22:16:54 ha09a kernel: drbd ha01_mysql ha09b: Committing remote state 
> change 610633182 (primary_nodes=1)
> May 17 22:16:54 ha09a kernel: drbd ha01_mysql ha09b: peer( Secondary -> 
> Primary )
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Initiating notify 
> operation p_drbd0_post_notify_promote_0 locally on ha09a
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Initiating notify 
> operation p_drbd0_post_notify_promote_0 on ha09b
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Result of notify 
> operation for p_drbd0 on ha09a: ok
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Initiating start 
> operation p_vdo0_start_0 on ha09b
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Initiating monitor 
> operation p_drbd0_monitor_60000 locally on ha09a
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Result of monitor 
> operation for p_drbd0 on ha09a: ok
> May 17 22:16:56 ha09a pacemaker-controld[2657]: notice: Initiating monitor 
> operation p_vdo0_monitor_15000 on ha09b
> May 17 22:16:57 ha09a pacemaker-controld[2657]: notice: Transition 26 
> (Complete=28, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
> Source=/var/lib/pacemaker/pengine/pe-input-461.bz2): Complete
> May 17 22:16:57 ha09a pacemaker-controld[2657]: notice: State transition 
> S_TRANSITION_ENGINE -> S_IDLE
>
> Here are the logs from ha09b...
>
> May 17 22:16:53 ha09b UDS/vdodumpconfig[3494]: ERROR  (vdodumpconfig/3494) 
> openFile(): failed opening /dev/drbd0 with file access: 4: Wrong medium type 
> (124)
> May 17 22:16:53 ha09b vdo[3486]: ERROR - vdodumpconfig: Failed to make 
> FileLayer from '/dev/drbd0' with Wrong medium type
> May 17 22:16:53 ha09b pacemaker-controld[2710]: notice: Result of start 
> operation for p_vdo0 on ha09b: ok
> May 17 22:16:53 ha09b Filesystem(p_fs_clust08)[3496]: INFO: Running start for 
> /dev/mapper/vdo0 on /ha01_mysql
> May 17 22:16:53 ha09b UDS/vdodumpconfig[3577]: ERROR  (vdodumpconfig/3577) 
> openFile(): failed opening /dev/drbd0 with file access: 4: Wrong medium type 
> (124)
> May 17 22:16:53 ha09b vdo[3503]: ERROR - vdodumpconfig: Failed to make 
> FileLayer from '/dev/drbd0' with Wrong medium type
> May 17 22:16:53 ha09b pacemaker-controld[2710]: notice: Result of monitor 
> operation for p_vdo0 on ha09b: not running
> May 17 22:16:53 ha09b pacemaker-controld[2710]: notice: 
> ha09b-p_vdo0_monitor_15000:35 [ error occurred checking vdo0 status on 
> ha09b\n ]
> May 17 22:16:53 ha09b pacemaker-attrd[2708]: notice: Setting 
> fail-count-p_vdo0#monitor_15000[ha09b]: (unset) -> 1
> May 17 22:16:53 ha09b pacemaker-attrd[2708]: notice: Setting 
> last-failure-p_vdo0#monitor_15000[ha09b]: (unset) -> 1621315013
> May 17 22:16:53 ha09b Filesystem(p_fs_clust08)[3496]: ERROR: Couldn't find 
> device [/dev/mapper/vdo0]. Expected /dev/??? to exist
> May 17 22:16:53 ha09b pacemaker-execd[2707]: notice: 
> p_fs_clust08_start_0[3496] error output [ ocf-exit-reason:Couldn't find 
> device [/dev/mapper/vdo0]. Expected /dev/??? to exist ]
> May 17 22:16:53 ha09b pacemaker-controld[2710]: notice: Result of start 
> operation for p_fs_clust08 on ha09b: not installed
> May 17 22:16:53 ha09b pacemaker-controld[2710]: notice: 
> ha09b-p_fs_clust08_start_0:36 [ ocf-exit-reason:Couldn't find device 
> [/dev/mapper/vdo0]. Expected /dev/??? to exist\n ]
> May 17 22:16:53 ha09b pacemaker-attrd[2708]: notice: Setting 
> fail-count-p_fs_clust08#start_0[ha09b]: (unset) -> INFINITY
> May 17 22:16:53 ha09b pacemaker-attrd[2708]: notice: Setting 
> last-failure-p_fs_clust08#start_0[ha09b]: (unset) -> 1621315013
> May 17 22:16:53 ha09b Filesystem(p_fs_clust08)[3609]: WARNING: Couldn't find 
> device [/dev/mapper/vdo0]. Expected /dev/??? to exist
> May 17 22:16:53 ha09b pacemaker-controld[2710]: notice: Result of notify 
> operation for p_drbd0 on ha09b: ok
> May 17 22:16:53 ha09b Filesystem(p_fs_clust08)[3609]: INFO: Running stop for 
> /dev/mapper/vdo0 on /ha01_mysql
> May 17 22:16:53 ha09b pacemaker-execd[2707]: notice: 
> p_fs_clust08_stop_0[3609] error output [ blockdev: cannot open 
> /dev/mapper/vdo0: No such file or directory ]
> May 17 22:16:53 ha09b pacemaker-controld[2710]: notice: Result of stop 
> operation for p_fs_clust08 on ha09b: ok
> May 17 22:16:53 ha09b pacemaker-controld[2710]: notice: 
> ha09b-p_vdo0_monitor_15000:35 [ error occurred checking vdo0 status on 
> ha09b\n ]
> May 17 22:16:54 ha09b UDS/vdodumpconfig[3705]: ERROR  (vdodumpconfig/3705) 
> openFile(): failed opening /dev/drbd0 with file access: 4: Wrong medium type 
> (124)
> May 17 22:16:54 ha09b vdo[3697]: ERROR - vdodumpconfig: Failed to make 
> FileLayer from '/dev/drbd0' with Wrong medium type
> May 17 22:16:54 ha09b pacemaker-controld[2710]: notice: Result of stop 
> operation for p_vdo0 on ha09b: ok
> May 17 22:16:54 ha09b kernel: drbd ha01_mysql ha09a: peer( Primary -> 
> Secondary )
> May 17 22:16:54 ha09b pacemaker-controld[2710]: notice: Result of notify 
> operation for p_drbd0 on ha09b: ok
> May 17 22:16:54 ha09b pacemaker-controld[2710]: notice: Result of notify 
> operation for p_drbd0 on ha09b: ok
> May 17 22:16:54 ha09b kernel: drbd ha01_mysql: Preparing cluster-wide state 
> change 610633182 (0->-1 3/1)
> May 17 22:16:54 ha09b kernel: drbd ha01_mysql: State change 610633182: 
> primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFFC
> May 17 22:16:54 ha09b kernel: drbd ha01_mysql: Committing cluster-wide state 
> change 610633182 (1ms)
> May 17 22:16:54 ha09b kernel: drbd ha01_mysql: role( Secondary -> Primary )
> May 17 22:16:54 ha09b pacemaker-controld[2710]: notice: Result of promote 
> operation for p_drbd0 on ha09b: ok
> May 17 22:16:54 ha09b pacemaker-controld[2710]: notice: Result of notify 
> operation for p_drbd0 on ha09b: ok
> May 17 22:16:55 ha09b kernel: uds: modprobe: loaded version 8.0.1.6
> May 17 22:16:55 ha09b kernel: kvdo: modprobe: loaded version 6.2.3.114
> May 17 22:16:55 ha09b kernel: kvdo0:dmsetup: underlying device, REQ_FLUSH: 
> supported, REQ_FUA: supported
> May 17 22:16:55 ha09b kernel: kvdo0:dmsetup: Using write policy async 
> automatically.
> May 17 22:16:55 ha09b kernel: kvdo0:dmsetup: loading device 'vdo0'
> May 17 22:16:55 ha09b kernel: kvdo0:dmsetup: zones: 1 logical, 1 physical, 1 
> hash; base threads: 5
> May 17 22:16:55 ha09b kernel: kvdo0:dmsetup: starting device 'vdo0'
> May 17 22:16:55 ha09b kernel: kvdo0:journalQ: VDO commencing normal operation
> May 17 22:16:55 ha09b kernel: kvdo0:dmsetup: Setting UDS index target state 
> to online
> May 17 22:16:55 ha09b kernel: kvdo0:dmsetup: device 'vdo0' started
> May 17 22:16:55 ha09b kernel: kvdo0:dmsetup: resuming device 'vdo0'
> May 17 22:16:55 ha09b kernel: kvdo0:dmsetup: device 'vdo0' resumed
> May 17 22:16:55 ha09b kernel: uds: kvdo0:dedupeQ: loading or rebuilding 
> index: dev=/dev/drbd0 offset=4096 size=2781704192
> May 17 22:16:55 ha09b kernel: uds: kvdo0:dedupeQ: Using 6 indexing zones for 
> concurrency.
> May 17 22:16:55 ha09b kernel: kvdo0:packerQ: compression is enabled
> May 17 22:16:55 ha09b systemd[1]: Started Device-mapper event daemon.
> May 17 22:16:55 ha09b dmeventd[3931]: dmeventd ready for processing.
> May 17 22:16:55 ha09b UDS/vdodmeventd[3930]: INFO   (vdodmeventd/3930) VDO 
> device vdo0 is now registered with dmeventd for monitoring
> May 17 22:16:55 ha09b lvm[3931]: Monitoring VDO pool vdo0.
> May 17 22:16:56 ha09b kernel: uds: kvdo0:dedupeQ: loaded index from chapter 0 
> through chapter 85
> May 17 22:16:56 ha09b pacemaker-controld[2710]: notice: Result of start 
> operation for p_vdo0 on ha09b: ok
> May 17 22:16:57 ha09b pacemaker-controld[2710]: notice: Result of monitor 
> operation for p_vdo0 on ha09b: ok
>
>
>
> > -----Original Message-----
> > From: Users <users-boun...@clusterlabs.org> On Behalf Of Eric Robinson
> > Sent: Monday, May 17, 2021 9:49 PM
> > To: Cluster Labs - All topics related to open-source clustering welcomed
> > <users@clusterlabs.org>
> > Subject: Re: [ClusterLabs] DRBD + VDO HowTo?
> >
> > Notice in that 'pcs status' shows errors for resource p_vdo0 on node ha09b,
> > even after doing 'pcs resource cleanup p_vdo0'.
> >
> > [root@ha09a ~]# pcs status
> > Cluster name: ha09ab
> > Cluster Summary:
> >   * Stack: corosync
> >   * Current DC: ha09a (version 2.0.4-6.el8_3.2-2deceaa3ae) - partition with
> > quorum
> >   * Last updated: Mon May 17 19:45:41 2021
> >   * Last change:  Mon May 17 19:45:37 2021 by hacluster via crmd on ha09b
> >   * 2 nodes configured
> >   * 6 resource instances configured
> >
> > Node List:
> >   * Online: [ ha09a ha09b ]
> >
> > Full List of Resources:
> >   * Clone Set: p_drbd0-clone [p_drbd0] (promotable):
> >     * Masters: [ ha09a ]
> >     * Slaves: [ ha09b ]
> >   * Clone Set: p_drbd1-clone [p_drbd1] (promotable):
> >     * Masters: [ ha09b ]
> >     * Slaves: [ ha09a ]
> >   * p_vdo0      (lsb:vdo0):      Starting ha09a
> >   * p_vdo1      (lsb:vdo1):      Started ha09b
> >
> > Failed Resource Actions:
> >   * p_vdo0_monitor_0 on ha09b 'error' (1): call=83, status='complete',
> > exitreason='', last-rc-change='2021-05-17 19:45:38 -07:00', queued=0ms,
> > exec=175ms
> >
> > Daemon Status:
> >   corosync: active/disabled
> >   pacemaker: active/disabled
> >   pcsd: active/enabled
> >
> >
> > If I debug the monitor action on ha09b, it reports 'not installed,' which 
> > makes
> > sense because the drbd disk is in standby.
> >
> > [root@ha09b drbd.d]# pcs resource debug-monitor p_vdo0 Operation
> > monitor for p_vdo0 (lsb::vdo0) returned: 'not installed' (5)  >  stdout: 
> > error
> > occurred checking vdo0 status on ha09b
> >
> > Should it report something else?
> >
> > > -----Original Message-----
> > > From: Users <users-boun...@clusterlabs.org> On Behalf Of Eric Robinson
> > > Sent: Monday, May 17, 2021 1:37 PM
> > > To: Cluster Labs - All topics related to open-source clustering
> > > welcomed <users@clusterlabs.org>
> > > Subject: Re: [ClusterLabs] DRBD + VDO HowTo?
> > >
> > > Andrei --
> > >
> > > To follow up, here is the Pacemaker config. Let's not talk about
> > > fencing or quorum right now. I want to focus on the vdo issue at hand.
> > >
> > > [root@ha09a ~]# pcs config
> > > Cluster Name: ha09ab
> > > Corosync Nodes:
> > >  ha09a ha09b
> > > Pacemaker Nodes:
> > >  ha09a ha09b
> > >
> > > Resources:
> > >  Clone: p_drbd0-clone
> > >   Meta Attrs: clone-max=2 clone-node-max=1 notify=true promotable=true
> > > promoted-max=1 promoted-node-max=1
> > >   Resource: p_drbd0 (class=ocf provider=linbit type=drbd)
> > >    Attributes: drbd_resource=ha01_mysql
> > >    Operations: demote interval=0s timeout=90 (p_drbd0-demote-interval-
> > 0s)
> > >                monitor interval=60s (p_drbd0-monitor-interval-60s)
> > >                notify interval=0s timeout=90 (p_drbd0-notify-interval-0s)
> > >                promote interval=0s timeout=90 
> > > (p_drbd0-promote-interval-0s)
> > >                reload interval=0s timeout=30 (p_drbd0-reload-interval-0s)
> > >                start interval=0s timeout=240 (p_drbd0-start-interval-0s)
> > >                stop interval=0s timeout=100 (p_drbd0-stop-interval-0s)
> > >  Clone: p_drbd1-clone
> > >   Meta Attrs: clone-max=2 clone-node-max=1 notify=true promotable=true
> > > promoted-max=1 promoted-node-max=1
> > >   Resource: p_drbd1 (class=ocf provider=linbit type=drbd)
> > >    Attributes: drbd_resource=ha02_mysql
> > >    Operations: demote interval=0s timeout=90 (p_drbd1-demote-interval-
> > 0s)
> > >                monitor interval=60s (p_drbd1-monitor-interval-60s)
> > >                notify interval=0s timeout=90 (p_drbd1-notify-interval-0s)
> > >                promote interval=0s timeout=90 
> > > (p_drbd1-promote-interval-0s)
> > >                reload interval=0s timeout=30 (p_drbd1-reload-interval-0s)
> > >                start interval=0s timeout=240 (p_drbd1-start-interval-0s)
> > >                stop interval=0s timeout=100 (p_drbd1-stop-interval-0s)
> > >  Resource: p_vdo0 (class=lsb type=vdo0)
> > >   Operations: force-reload interval=0s timeout=15
> > > (p_vdo0-force-reload-
> > > interval-0s)
> > >               monitor interval=15 timeout=15 (p_vdo0-monitor-interval-15)
> > >               restart interval=0s timeout=15 (p_vdo0-restart-interval-0s)
> > >               start interval=0s timeout=15 (p_vdo0-start-interval-0s)
> > >               stop interval=0s timeout=15 (p_vdo0-stop-interval-0s)
> > >  Resource: p_vdo1 (class=lsb type=vdo1)
> > >   Operations: force-reload interval=0s timeout=15
> > > (p_vdo1-force-reload-
> > > interval-0s)
> > >               monitor interval=15 timeout=15 (p_vdo1-monitor-interval-15)
> > >               restart interval=0s timeout=15 (p_vdo1-restart-interval-0s)
> > >               start interval=0s timeout=15 (p_vdo1-start-interval-0s)
> > >               stop interval=0s timeout=15 (p_vdo1-stop-interval-0s)
> > >
> > > Stonith Devices:
> > > Fencing Levels:
> > >
> > > Location Constraints:
> > > Ordering Constraints:
> > >   promote p_drbd0-clone then start p_vdo0 (kind:Mandatory) (id:order-
> > > p_drbd0-clone-p_vdo0-mandatory)
> > >   promote p_drbd1-clone then start p_vdo1 (kind:Mandatory) (id:order-
> > > p_drbd1-clone-p_vdo1-mandatory)
> > > Colocation Constraints:
> > >   p_vdo0 with p_drbd0-clone (score:INFINITY) (id:colocation-p_vdo0-
> > > p_drbd0-clone-INFINITY)
> > >   p_vdo1 with p_drbd1-clone (score:INFINITY) (id:colocation-p_vdo1-
> > > p_drbd1-clone-INFINITY)
> > > Ticket Constraints:
> > >
> > > Alerts:
> > >  No alerts defined
> > >
> > > Resources Defaults:
> > >   Meta Attrs: rsc_defaults-meta_attributes
> > >     resource-stickiness=100
> > > Operations Defaults:
> > >   Meta Attrs: op_defaults-meta_attributes
> > >     timeout=30s
> > >
> > > Cluster Properties:
> > >  cluster-infrastructure: corosync
> > >  cluster-name: ha09ab
> > >  dc-version: 2.0.4-6.el8_3.2-2deceaa3ae
> > >  have-watchdog: false
> > >  last-lrm-refresh: 1621198059
> > >  maintenance-mode: false
> > >  no-quorum-policy: ignore
> > >  stonith-enabled: false
> > >
> > > Tags:
> > >  No tags defined
> > >
> > > Quorum:
> > >   Options:
> > >
> > > Here is the cluster status. Right now, node ha09a is primary for both
> > > drbd disks.
> > >
> > > [root@ha09a ~]# pcs status
> > > Cluster name: ha09ab
> > > Cluster Summary:
> > >   * Stack: corosync
> > >   * Current DC: ha09a (version 2.0.4-6.el8_3.2-2deceaa3ae) - partition
> > > with quorum
> > >   * Last updated: Mon May 17 11:35:34 2021
> > >   * Last change:  Mon May 17 11:34:24 2021 by hacluster via crmd on ha09a
> > >   * 2 nodes configured
> > >   * 6 resource instances configured (2 BLOCKED from further action due
> > > to
> > > failure)
> > >
> > > Node List:
> > >   * Online: [ ha09a ha09b ]
> > >
> > > Full List of Resources:
> > >   * Clone Set: p_drbd0-clone [p_drbd0] (promotable):
> > >     * Masters: [ ha09a ]
> > >     * Slaves: [ ha09b ]
> > >   * Clone Set: p_drbd1-clone [p_drbd1] (promotable):
> > >     * Masters: [ ha09a ]
> > >     * Slaves: [ ha09b ]
> > >   * p_vdo0      (lsb:vdo0):      FAILED ha09a (blocked)
> > >   * p_vdo1      (lsb:vdo1):      FAILED ha09a (blocked)
> > >
> > > Failed Resource Actions:
> > >   * p_vdo1_stop_0 on ha09a 'error' (1): call=21, status='Timed Out',
> > > exitreason='', last-rc-change='2021-05-17 11:29:09 -07:00',
> > > queued=0ms, exec=15001ms
> > >   * p_vdo0_stop_0 on ha09a 'error' (1): call=27, status='Timed Out',
> > > exitreason='', last-rc-change='2021-05-17 11:34:26 -07:00',
> > > queued=0ms, exec=15001ms
> > >   * p_vdo1_monitor_0 on ha09b 'error' (1): call=21, status='complete',
> > > exitreason='', last-rc-change='2021-05-17 11:29:08 -07:00',
> > > queued=0ms, exec=217ms
> > >   * p_vdo0_monitor_0 on ha09b 'error' (1): call=28, status='complete',
> > > exitreason='', last-rc-change='2021-05-17 11:34:25 -07:00',
> > > queued=0ms, exec=182ms
> > >
> > > Daemon Status:
> > >   corosync: active/disabled
> > >   pacemaker: active/disabled
> > >   pcsd: active/enabled
> > >
> > > The vdo devices are available...
> > >
> > > [root@ha09a ~]# vdo list
> > > vdo0
> > > vdo1
> > >
> > >
> > > > -----Original Message-----
> > > > From: Users <users-boun...@clusterlabs.org> On Behalf Of Eric
> > > > Robinson
> > > > Sent: Monday, May 17, 2021 1:28 PM
> > > > To: Cluster Labs - All topics related to open-source clustering
> > > > welcomed <users@clusterlabs.org>
> > > > Subject: Re: [ClusterLabs] DRBD + VDO HowTo?
> > > >
> > > > Andrei --
> > > >
> > > > Sorry for the novels. Sometimes it is hard to tell whether people
> > > > want all the configs, logs, and scripts first, or if they want a
> > > > description of the problem and what one is trying to accomplish first.
> > > > I'll send whatever you want. I am very eager to get to the bottom of 
> > > > this.
> > > >
> > > > I'll start with my custom LSB RA. I can send the Pacemaker config a bit
> > later.
> > > >
> > > > [root@ha09a init.d]# ll|grep vdo
> > > > lrwxrwxrwx. 1 root root     9 May 16 10:28 vdo0 -> vdo_multi
> > > > lrwxrwxrwx. 1 root root     9 May 16 10:28 vdo1 -> vdo_multi
> > > > -rwx------. 1 root root  3623 May 16 13:21 vdo_multi
> > > >
> > > > [root@ha09a init.d]#  cat vdo_multi
> > > > #!/bin/bash
> > > >
> > > > #--custom script for managing vdo volumes
> > > >
> > > > #--functions
> > > > function isActivated() {
> > > >         R=$(/usr/bin/vdo status -n $VOL 2>&1)
> > > >         if [ $? -ne 0 ]; then
> > > >                 #--error occurred checking vdo status
> > > >                 echo "$VOL: an error occurred checking activation
> > > > status on $MY_HOSTNAME"
> > > >                 return 1
> > > >         fi
> > > >         R=$(/usr/bin/vdo status -n $VOL|grep Activate|awk 
> > > > '{$1=$1};1'|cut -
> > d"
> > > "
> > > > -f2)
> > > >         echo "$R"
> > > >         return 0
> > > > }
> > > >
> > > > function isOnline() {
> > > >         R=$(/usr/bin/vdo status -n $VOL 2>&1)
> > > >         if [ $? -ne 0 ]; then
> > > >                 #--error occurred checking vdo status
> > > >                 echo "$VOL: an error occurred checking activation
> > > > status on $MY_HOSTNAME"
> > > >                 return 1
> > > >         fi
> > > >         R=$(/usr/bin/vdo status -n $VOL|grep "Index status"|awk
> > > > '{$1=$1};1'|cut -d" " -f3)
> > > >         echo "$R"
> > > >         return 0
> > > > }
> > > >
> > > > #--vars
> > > > MY_HOSTNAME=$(hostname -s)
> > > >
> > > > #--get the volume name
> > > > VOL=$(basename $0)
> > > >
> > > > #--get the action
> > > > ACTION=$1
> > > >
> > > > #--take the requested action
> > > > case $ACTION in
> > > >
> > > >         start)
> > > >
> > > >                 #--check current status
> > > >                 R=$(isOnline "$VOL")
> > > >                 if [ $? -ne 0 ]; then
> > > >                         echo "error occurred checking $VOL status on
> > > $MY_HOSTNAME"
> > > >                         exit 0
> > > >                 fi
> > > >                 if [ "$R"  == "online" ]; then
> > > >                         echo "running on $MY_HOSTNAME"
> > > >                         exit 0 #--lsb: success
> > > >                 fi
> > > >
> > > >                 #--enter activation loop
> > > >                 ACTIVATED=no
> > > >                 TIMER=15
> > > >                 while [ $TIMER -ge 0 ]; do
> > > >                         R=$(isActivated "$VOL")
> > > >                         if [ "$R" == "enabled" ]; then
> > > >                                 ACTIVATED=yes
> > > >                                 break
> > > >                         fi
> > > >                         sleep 1
> > > >                         TIMER=$(( TIMER-1 ))
> > > >                 done
> > > >                 if [ "$ACTIVATED" == "no" ]; then
> > > >                         echo "$VOL: not activated on $MY_HOSTNAME"
> > > >                         exit 5 #--lsb: not running
> > > >                 fi
> > > >
> > > >                 #--enter start loop
> > > >                 /usr/bin/vdo start -n $VOL
> > > >                 ONLINE=no
> > > >                 TIMER=15
> > > >                 while [ $TIMER -ge 0 ]; do
> > > >                         R=$(isOnline "$VOL")
> > > >                         if [ "$R" == "online" ]; then
> > > >                                 ONLINE=yes
> > > >                                 break
> > > >                         fi
> > > >                         sleep 1
> > > >                         TIMER=$(( TIMER-1 ))
> > > >                 done
> > > >                 if [ "$ONLINE" == "yes" ]; then
> > > >                         echo "$VOL: started on $MY_HOSTNAME"
> > > >                         exit 0 #--lsb: success
> > > >                 else
> > > >                         echo "$VOL: not started on $MY_HOSTNAME
> > > > (unknown problem)"
> > > >                         exit 0 #--lsb: unknown problem
> > > >                 fi
> > > >                 ;;
> > > >         stop)
> > > >
> > > >                 #--check current status
> > > >                 R=$(isOnline "$VOL")
> > > >                 if [ $? -ne 0 ]; then
> > > >                         echo "error occurred checking $VOL status on
> > > $MY_HOSTNAME"
> > > >                         exit 0
> > > >                 fi
> > > >
> > > >                 if [ "$R" == "not" ]; then
> > > >                         echo "not started on $MY_HOSTNAME"
> > > >                         exit 0 #--lsb: success
> > > >                 fi
> > > >
> > > >                 #--enter stop loop
> > > >                 /usr/bin/vdo stop -n $VOL
> > > >                 ONLINE=yes
> > > >                 TIMER=15
> > > >                 while [ $TIMER -ge 0 ]; do
> > > >                         R=$(isOnline "$VOL")
> > > >                         if [ "$R" == "not" ]; then
> > > >                                 ONLINE=no
> > > >                                 break
> > > >                         fi
> > > >                         sleep 1
> > > >                         TIMER=$(( TIMER-1 ))
> > > >                 done
> > > >                 if [ "$ONLINE" == "no" ]; then
> > > >                         echo "$VOL: stopped on $MY_HOSTNAME"
> > > >                         exit 0 #--lsb:success
> > > >                 else
> > > >                         echo "$VOL: failed to stop on $MY_HOSTNAME
> > > > (unknown problem)"
> > > >                         exit 0
> > > >                 fi
> > > >                 ;;
> > > >         status)
> > > >                 R=$(isOnline "$VOL")
> > > >                 if [ $? -ne 0 ]; then
> > > >                         echo "error occurred checking $VOL status on
> > > $MY_HOSTNAME"
> > > >                         exit 5
> > > >                 fi
> > > >                 if [ "$R"  == "online" ]; then
> > > >                         echo "$VOL started on $MY_HOSTNAME"
> > > >                         exit 0 #--lsb: success
> > > >                 else
> > > >                         echo "$VOL not started on $MY_HOSTNAME"
> > > >                         exit 3 #--lsb: not running
> > > >                 fi
> > > >                 ;;
> > > >
> > > > esac
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Users <users-boun...@clusterlabs.org> On Behalf Of Andrei
> > > > > Borzenkov
> > > > > Sent: Monday, May 17, 2021 12:49 PM
> > > > > To: users@clusterlabs.org
> > > > > Subject: Re: [ClusterLabs] DRBD + VDO HowTo?
> > > > >
> > > > > On 17.05.2021 18:18, Eric Robinson wrote:
> > > > > > To Strahil and Klaus –
> > > > > >
> > > > > > I created the vdo devices using default parameters, so ‘auto’
> > > > > > mode was
> > > > > selected by default. vdostatus shows that the current mode is async.
> > > > > The underlying drbd devices are running protocol C, so I assume
> > > > > that vdo should be changed to sync mode?
> > > > > >
> > > > > > The VDO service is disabled and is solely under the control of
> > > > > > Pacemaker,
> > > > > but I have been unable to get a resource agent to work reliably. I
> > > > > have two nodes. Under normal operation, Node A is primary for disk
> > > > > drbd0, and device
> > > > > vdo0 rides on top of that. Node B is primary for disk drbd1 and
> > > > > device
> > > > > vdo1 rides on top of that. In the event of a node failure, the vdo
> > > > > device and the underlying drbd disk should migrate to the other
> > > > > node, and then that node will be primary for both drbd disks and
> > > > > both vdo
> > > > devices.
> > > > > >
> > > > > > The default systemd vdo service does not work because it uses
> > > > > > the –all flag
> > > > > and starts/stops all vdo devices. I noticed that there is also a
> > > > > vdo-start-by- dev.service, but there is no documentation on how to
> > > > > use it. I wrote my own vdo-by-dev system service, but that did not
> > > > > work reliably either. Then I noticed that there is already an OCF
> > > > > resource agent named vdo-vol, but that did not work either. I
> > > > > finally tried writing my own OCF-compliant RA, and then I tried
> > > > > writing an LSB-compliant script, but none of those worked very well.
> > > > > >
> > > > >
> > > > > You continue to write novels instead of simply showing your
> > > > > resource agent, your configuration and logs.
> > > > >
> > > > > > My big problem is that I don’t understand how Pacemaker uses the
> > > > > monitor action. Pacemaker would often fail vdo resources because
> > > > > the monitor action received an error when it ran on the standby node.
> > > > > For example, when Node A is primary for disk drbd1 and device
> > > > > vdo1, Pacemaker would fail device vdo1 because when it ran the
> > > > > monitor action on Node B, the RA reported an error. But OF COURSE
> > > > > it would report an error, because disk drbd1 is secondary on that
> > > > > node, and is therefore inaccessible to the vdo driver. I DON’T
> > UNDERSTAND.
> > > > > >
> > > > >
> > > > > May be your definition of "error" does not match pacemaker
> > > > > definition of "error". It is hard to comment without seeing code.
> > > > >
> > > > > > -Eric
> > > > > >
> > > > > >
> > > > > >
> > > > > > From: Strahil Nikolov <hunter86...@yahoo.com>
> > > > > > Sent: Monday, May 17, 2021 5:09 AM
> > > > > > To: kwenn...@redhat.com; Klaus Wenninger
> > > <kwenn...@redhat.com>;
> > > > > > Cluster Labs - All topics related to open-source clustering
> > > > > > welcomed <users@clusterlabs.org>; Eric Robinson
> > > > > > <eric.robin...@psmnv.com>
> > > > > > Subject: Re: [ClusterLabs] DRBD + VDO HowTo?
> > > > > >
> > > > > > Have you tried to set VDO in async mode ?
> > > > > >
> > > > > > Best Regards,
> > > > > > Strahil Nikolov
> > > > > > On Mon, May 17, 2021 at 8:57, Klaus Wenninger
> > > > > > <kwenn...@redhat.com<mailto:kwenn...@redhat.com>> wrote:
> > > > > > Did you try VDO in sync-mode for the case the flush-fua stuff
> > > > > > isn't working through the layers?
> > > > > > Did you check that VDO-service is disabled and solely under
> > > > > > pacemaker-control and that the dependencies are set correctly?
> > > > > >
> > > > > > Klaus
> > > > > >
> > > > > > On 5/17/21 6:17 AM, Eric Robinson wrote:
> > > > > >
> > > > > > Yes, DRBD is working fine.
> > > > > >
> > > > > >
> > > > > >
> > > > > > From: Strahil Nikolov
> > > > > > <hunter86...@yahoo.com><mailto:hunter86...@yahoo.com>
> > > > > > Sent: Sunday, May 16, 2021 6:06 PM
> > > > > > To: Eric Robinson
> > > > > > <eric.robin...@psmnv.com><mailto:eric.robin...@psmnv.com>;
> > > > Cluster
> > > > > > Labs - All topics related to open-source clustering welcomed
> > > > > > <users@clusterlabs.org><mailto:users@clusterlabs.org>
> > > > > > Subject: RE: [ClusterLabs] DRBD + VDO HowTo?
> > > > > >
> > > > > >
> > > > > >
> > > > > > Are you sure that the DRBD is working properly ?
> > > > > >
> > > > > >
> > > > > >
> > > > > > Best Regards,
> > > > > >
> > > > > > Strahil Nikolov
> > > > > >
> > > > > > On Mon, May 17, 2021 at 0:32, Eric Robinson
> > > > > >
> > > > > > <eric.robin...@psmnv.com<mailto:eric.robin...@psmnv.com>>
> > > wrote:
> > > > > >
> > > > > > Okay, it turns out I was wrong. I thought I had it working, but
> > > > > > I keep running
> > > > > into problems. Sometimes when I demote a DRBD resource on Node A
> > > and
> > > > > promote it on Node B, and I try to mount the filesystem, the
> > > > > system complains that it cannot read the superblock. But when I
> > > > > move the DRBD primary back to Node A, the file system is mountable
> > again.
> > > > > Also, I have problems with filesystems not mounting because the
> > > > > vdo devices are not present. All kinds of issues.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > From: Users
> > > > > > <users-boun...@clusterlabs.org<mailto:users-
> > > > boun...@clusterlabs.org>
> > > > > > >
> > > > > > On Behalf Of Eric Robinson
> > > > > > Sent: Friday, May 14, 2021 3:55 PM
> > > > > > To: Strahil Nikolov
> > > > > > <hunter86...@yahoo.com<mailto:hunter86...@yahoo.com>>;
> > > Cluster
> > > > > Labs -
> > > > > > All topics related to open-source clustering welcomed
> > > > > > <users@clusterlabs.org<mailto:users@clusterlabs.org>>
> > > > > > Subject: Re: [ClusterLabs] DRBD + VDO HowTo?
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Okay, I have it working now. The default systemd service
> > > > > > definitions did
> > > > > not work, so I created my own.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > From: Strahil Nikolov
> > > > > > <hunter86...@yahoo.com<mailto:hunter86...@yahoo.com>>
> > > > > > Sent: Friday, May 14, 2021 3:41 AM
> > > > > > To: Eric Robinson
> > > > > > <eric.robin...@psmnv.com<mailto:eric.robin...@psmnv.com>>;
> > > > Cluster
> > > > > > Labs - All topics related to open-source clustering welcomed
> > > > > > <users@clusterlabs.org<mailto:users@clusterlabs.org>>
> > > > > > Subject: RE: [ClusterLabs] DRBD + VDO HowTo?
> > > > > >
> > > > > >
> > > > > >
> > > > > > There is no VDO RA according to my knowledge, but you can use
> > > > > > systemd
> > > > > service as a resource.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Yet, the VDO service that comes with thr OS is a generic one and
> > > > > > controlls
> > > > > all VDOs - so you need to create your own vdo service.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Best Regards,
> > > > > >
> > > > > > Strahil Nikolov
> > > > > >
> > > > > > On Fri, May 14, 2021 at 6:55, Eric Robinson
> > > > > >
> > > > > > <eric.robin...@psmnv.com<mailto:eric.robin...@psmnv.com>>
> > > wrote:
> > > > > >
> > > > > > I created the VDO volumes fine on the drbd devices, formatted
> > > > > > them as xfs
> > > > > filesystems, created cluster filesystem resources, and the cluster
> > > > > us using them. But the cluster won’t fail over. Is there a VDO
> > > > > cluster RA out there somewhere already?
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > From: Strahil Nikolov
> > > > > > <hunter86...@yahoo.com<mailto:hunter86...@yahoo.com>>
> > > > > > Sent: Thursday, May 13, 2021 10:07 PM
> > > > > > To: Cluster Labs - All topics related to open-source clustering
> > > > > > welcomed <users@clusterlabs.org<mailto:users@clusterlabs.org>>;
> > > > > > Eric Robinson
> > > > > <eric.robin...@psmnv.com<mailto:eric.robin...@psmnv.com>>
> > > > > > Subject: Re: [ClusterLabs] DRBD + VDO HowTo?
> > > > > >
> > > > > >
> > > > > >
> > > > > > For DRBD there is enough info, so let's focus on VDO.
> > > > > >
> > > > > > There is a systemd service that starts all VDOs on the system.
> > > > > > You can
> > > > > create the VDO once drbs is open for writes and then you can
> > > > > create your own systemd '.service' file which can be used as a cluster
> > resource.
> > > > > >
> > > > > >
> > > > > > Best Regards,
> > > > > >
> > > > > > Strahil Nikolov
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, May 14, 2021 at 2:33, Eric Robinson
> > > > > >
> > > > > > <eric.robin...@psmnv.com<mailto:eric.robin...@psmnv.com>>
> > > wrote:
> > > > > >
> > > > > > Can anyone point to a document on how to use VDO de-duplication
> > > > > > with
> > > > > DRBD? Linbit has a blog page about it, but it was last updated 6
> > > > > years ago and the embedded links are dead.
> > > > > >
> > > > > >
> > > > > >
> > > > > > https://linbit.com/blog/albireo-virtual-data-optimizer-vdo-on-dr
> > > > > > bd
> > > > > > /
> > > > > >
> > > > > >
> > > > > >
> > > > > > -Eric
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Disclaimer : This email and any files transmitted with it are
> > > > > > confidential and
> > > > > intended solely for intended recipients. If you are not the named
> > > > > addressee you should not disseminate, distribute, copy or alter
> > > > > this email. Any views or opinions presented in this email are
> > > > > solely those of the author and might not represent those of
> > > > > Physician Select Management. Warning: Although Physician Select
> > > > > Management
> > > has
> > > > > taken reasonable precautions to ensure no viruses are present in
> > > > > this email, the company cannot accept responsibility for any loss
> > > > > or damage arising
> > > > from the use of this email or attachments.
> > > > > >
> > > > > > _______________________________________________
> > > > > > Manage your subscription:
> > > > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > > > >
> > > > > > ClusterLabs home: https://www.clusterlabs.org/
> > > > > >
> > > > > > Disclaimer : This email and any files transmitted with it are
> > > > > > confidential and
> > > > > intended solely for intended recipients. If you are not the named
> > > > > addressee you should not disseminate, distribute, copy or alter
> > > > > this email. Any views or opinions presented in this email are
> > > > > solely those of the author and might not represent those of
> > > > > Physician Select Management. Warning: Although Physician Select
> > > > > Management
> > > has
> > > > > taken reasonable precautions to ensure no viruses are present in
> > > > > this email, the company cannot accept responsibility for any loss
> > > > > or damage arising
> > > > from the use of this email or attachments.
> > > > > >
> > > > > > Disclaimer : This email and any files transmitted with it are
> > > > > > confidential and
> > > > > intended solely for intended recipients. If you are not the named
> > > > > addressee you should not disseminate, distribute, copy or alter
> > > > > this email. Any views or opinions presented in this email are
> > > > > solely those of the author and might not represent those of
> > > > > Physician Select Management. Warning: Although Physician Select
> > > > > Management
> > > has
> > > > > taken reasonable precautions to ensure no viruses are present in
> > > > > this email, the company cannot accept responsibility for any loss
> > > > > or damage arising
> > > > from the use of this email or attachments.
> > > > > >
> > > > > > Disclaimer : This email and any files transmitted with it are
> > > > > > confidential and
> > > > > intended solely for intended recipients. If you are not the named
> > > > > addressee you should not disseminate, distribute, copy or alter
> > > > > this email. Any views or opinions presented in this email are
> > > > > solely those of the author and might not represent those of
> > > > > Physician Select Management. Warning: Although Physician Select
> > > > > Management
> > > has
> > > > > taken reasonable precautions to ensure no viruses are present in
> > > > > this email, the company cannot accept responsibility for any loss
> > > > > or damage arising
> > > > from the use of this email or attachments.
> > > > > > Disclaimer : This email and any files transmitted with it are
> > > > > > confidential and
> > > > > intended solely for intended recipients. If you are not the named
> > > > > addressee you should not disseminate, distribute, copy or alter
> > > > > this email. Any views or opinions presented in this email are
> > > > > solely those of the author and might not represent those of
> > > > > Physician Select Management. Warning: Although Physician Select
> > > > > Management
> > > has
> > > > > taken reasonable precautions to ensure no viruses are present in
> > > > > this email, the company cannot accept responsibility for any loss
> > > > > or damage arising
> > > > from the use of this email or attachments.
> > > > > >
> > > > > > _______________________________________________
> > > > > >
> > > > > > Manage your subscription:
> > > > > >
> > > > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > > > >
> > > > > >
> > > > > >
> > > > > > ClusterLabs home: https://www.clusterlabs.org/
> > > > > >
> > > > > > Disclaimer : This email and any files transmitted with it are
> > > > > > confidential and
> > > > > intended solely for intended recipients. If you are not the named
> > > > > addressee you should not disseminate, distribute, copy or alter
> > > > > this email. Any views or opinions presented in this email are
> > > > > solely those of the author and might not represent those of
> > > > > Physician Select Management. Warning: Although Physician Select
> > > > > Management
> > > has
> > > > > taken reasonable precautions to ensure no viruses are present in
> > > > > this email, the company cannot accept responsibility for any loss
> > > > > or damage arising
> > > > from the use of this email or attachments.
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > Manage your subscription:
> > > > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > > > >
> > > > > > ClusterLabs home: https://www.clusterlabs.org/
> > > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Manage your subscription:
> > > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > > >
> > > > > ClusterLabs home: https://www.clusterlabs.org/
> > > > Disclaimer : This email and any files transmitted with it are
> > > > confidential and intended solely for intended recipients. If you are
> > > > not the named addressee you should not disseminate, distribute, copy
> > > > or alter this email. Any views or opinions presented in this email
> > > > are solely those of the author and might not represent those of
> > > > Physician Select Management. Warning: Although Physician Select
> > > > Management has taken reasonable precautions to ensure no viruses are
> > > > present in this email, the company cannot accept responsibility for
> > > > any loss or damage
> > > arising from the use of this email or attachments.
> > > > _______________________________________________
> > > > Manage your subscription:
> > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > >
> > > > ClusterLabs home: https://www.clusterlabs.org/
> > > Disclaimer : This email and any files transmitted with it are
> > > confidential and intended solely for intended recipients. If you are
> > > not the named addressee you should not disseminate, distribute, copy
> > > or alter this email. Any views or opinions presented in this email are
> > > solely those of the author and might not represent those of Physician
> > > Select Management. Warning: Although Physician Select Management has
> > > taken reasonable precautions to ensure no viruses are present in this
> > > email, the company cannot accept responsibility for any loss or damage
> > arising from the use of this email or attachments.
> > > _______________________________________________
> > > Manage your subscription:
> > > https://lists.clusterlabs.org/mailman/listinfo/users
> > >
> > > ClusterLabs home: https://www.clusterlabs.org/
> > Disclaimer : This email and any files transmitted with it are confidential 
> > and
> > intended solely for intended recipients. If you are not the named addressee
> > you should not disseminate, distribute, copy or alter this email. Any views 
> > or
> > opinions presented in this email are solely those of the author and might 
> > not
> > represent those of Physician Select Management. Warning: Although
> > Physician Select Management has taken reasonable precautions to ensure
> > no viruses are present in this email, the company cannot accept 
> > responsibility
> > for any loss or damage arising from the use of this email or attachments.
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> Disclaimer : This email and any files transmitted with it are confidential 
> and intended solely for intended recipients. If you are not the named 
> addressee you should not disseminate, distribute, copy or alter this email. 
> Any views or opinions presented in this email are solely those of the author 
> and might not represent those of Physician Select Management. Warning: 
> Although Physician Select Management has taken reasonable precautions to 
> ensure no viruses are present in this email, the company cannot accept 
> responsibility for any loss or damage arising from the use of this email or 
> attachments.
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to