On Tue, Oct 22, 2024 at 3:18 PM Testuser SST via Users
<users@clusterlabs.org> wrote:
>
> Hi,
> I'm running a 2-node-web-cluster on Almalinux-9, pacemaker 2.1.7, drbd9 and 
> corosync 3.1.
> I have trouble with the promoting and mounting of the drbd-device. After 
> activating the cluster,
> the drbd-device is not getting mounted and is showing quite fast an error 
> message:
>
> pacemaker-schedulerd[4879]: warning: Unexpected result (error: Couldn't mount 
> device [/dev/drbd1] as /mnt/clusterfs) was recorded for start of 
> Webcontent_FS on ...
> pacemaker-schedulerd[4879]: warning: Webcontent_FS cannot run on kathie3 due 
> to reaching migration threshold (clean up resource to allow again)
>

Do you have any ordering constraints between Webcontent_DRBD and Webcontent_FS?

> It's like it's trying to mount the device, but the device is not ready yet.
> The device is the drbd1 and I'm trying to mount it on /mnt/clusterfs. After 
> the error occoured, and I do a "pcs resource cleanup" the cluster is able to 
> mount it.
> the drbd-resource is named webcontend_DRBD
> the mounted filesystem is named webcontend_FS
> All other resources like httpd and HA-IP's working like a charm.
>
> This is the log from the start of the cluster:
>
> Oct 22 11:48:12 kathie3 pacemaker-controld[4880]: notice: State transition 
> S_ELECTION -&gt; S_INTEGRATION
> Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start    
>   HA-IP_1               (                        kathie3 )
> Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start    
>   HA-IP_2               (                        kathie3 )
> Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start    
>   HA-IP_3               (                        kathie3 )
> Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start    
>   Webcontent_DRBD:0     (                        kathie3 )
> Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start    
>   Webcontent_FS         (                        kathie3 )
> Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start    
>   ping_fw:0             (                        kathie3 )
> Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Calculated 
> transition 1106, saving inputs in /var/lib/pacemaker/pengine/pe-input-336.bz2
> Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating start 
> operation HA-IP_1_start_0 locally on kathie3
> Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating start 
> operation Webcontent_FS_start_0 locally on kathie3
> Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating start 
> operation ping_fw_start_0 locally on kathie3
> Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating start 
> operation Webcontent_DRBD_start_0 locally on kathie3
> Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local 
> execution of start operation for HA-IP_1 on kathie3
> Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local 
> execution of start operation for ping_fw on kathie3
> Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local 
> execution of start operation for Webcontent_DRBD on kathie3
> Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local 
> execution of start operation for Webcontent_FS on kathie3
> Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_1)[1682892]: INFO: Adding inet address 
> 192.168.16.75/24 with broadcast address 192.168.16.255 to device ens3
> Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_1)[1682912]: INFO: Bringing device ens3 
> up
> Oct 22 11:48:13 kathie3 Filesystem(Webcontent_FS)[1682923]: INFO: Running 
> start for /dev/drbd1 on /mnt/clusterfs
> Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_1)[1682929]: INFO: 
> /usr/libexec/heartbeat/send_arp  -i 200 -r 5 -p 
> /run/resource-agents/send_arp-192.168.16.75 ens3 192.168.16.75 auto not_used 
> not_used
> Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data: Starting worker thread 
> (node-id 0)
> Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result of start 
> operation for HA-IP_1 on kathie3: ok
> Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating monitor 
> operation HA-IP_1_monitor_30000 locally on kathie3
> Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local 
> execution of monitor operation for HA-IP_1 on kathie3
> Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating start 
> operation HA-IP_2_start_0 locally on kathie3
> Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local 
> execution of start operation for HA-IP_2 on kathie3
> Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data: Auto-promote failed: 
> Need access to UpToDate data (-2)
> Oct 22 11:48:13 kathie3 kernel: /dev/drbd1: Can't open blockdev
> Oct 22 11:48:13 kathie3 kernel: /dev/drbd1: Can't open blockdev
> Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: meta-data IO 
> uses: blk-bio
> Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: disk( Diskless 
> -&gt; Attaching ) [attach]
> Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: Maximum number 
> of peer devices = 1
> Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data: Method to ensure write 
> ordering: flush
> Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: drbd_bm_resize 
> called with capacity == 104854328
> Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: resync bitmap: 
> bits=13106791 words=204794 pages=400
> Oct 22 11:48:13 kathie3 kernel: drbd1: detected capacity change from 0 to 
> 104854328
> Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: size = 50 GB 
> (52427164 KB)
> Oct 22 11:48:13 kathie3 Filesystem(Webcontent_FS)[1683017]: ERROR: Couldn't 
> mount device [/dev/drbd1] as /mnt/clusterfs
> Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result of start 
> operation for Webcontent_FS on kathie3: error (Couldn't mount device 
> [/dev/drbd1] as /mnt/clusterfs)
> Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: 
> Webcontent_FS_start_0@kathie3 output [ blockdev: cannot open /dev/drbd1: No 
> data available\nmount: /mnt/clusterfs: mount(2) system call failed: No data 
> available.\nocf-exit-reason:Couldn't mount device [/dev/drbd1] as 
> /mnt/clusterfs\n ]
> Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Transition 1106 
> aborted by operation Webcontent_FS_start_0 'modify' on kathie3: Event failed
> Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Transition 1106 
> action 37 (Webcontent_FS_start_0 on kathie3): expected 'ok' but got 'error'
> Oct 22 11:48:13 kathie3 pacemaker-attrd[4878]: notice: Setting 
> last-failure-Webcontent_FS#start_0[kathie3] in instance_attributes: (unset) 
> -&gt; 1729590493
> Oct 22 11:48:13 kathie3 pacemaker-attrd[4878]: notice: Setting 
> fail-count-Webcontent_FS#start_0[kathie3] in instance_attributes: (unset) 
> -&gt; INFINITY
> Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Transition 1106 
> aborted by status-1-last-failure-Webcontent_FS.start_0 doing create 
> last-failure-Webcontent_FS#start_0=1729590493: Transient attribute change
> Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: bitmap READ of 
> 400 pages took 34 ms
> Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: disk( Attaching 
> -&gt; UpToDate ) [attach]
> Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: attached to 
> current UUID: 826E8850CF10C812
> Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: Setting exposed 
> data uuid: 826E8850CF10C812
> Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result of monitor 
> operation for HA-IP_1 on kathie3: ok
> Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: Starting sender 
> thread (peer-node-id 1)
> Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: conn( StandAlone 
> -&gt; Unconnected ) [connect]
> Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: Starting 
> receiver thread (peer-node-id 1)
> Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: conn( 
> Unconnected -&gt; Connecting ) [connecting]
> Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_2)[1683100]: INFO: Adding inet address 
> 192.168.16.76/24 with broadcast address 192.168.16.255 to device ens3
> Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_2)[1683106]: INFO: Bringing device ens3 
> up
> Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_2)[1683112]: INFO: 
> /usr/libexec/heartbeat/send_arp  -i 200 -r 5 -p 
> /run/resource-agents/send_arp-192.168.16.76 ens3 192.168.16.76 auto not_used 
> not_used
> Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result of start 
> operation for HA-IP_2 on kathie3: ok
> Oct 22 11:48:15 kathie3 pacemaker-attrd[4878]: notice: Setting pingd[kathie3] 
> in instance_attributes: (unset) -&gt; 1000
> Oct 22 11:48:15 kathie3 pacemaker-controld[4880]: notice: Result of start 
> operation for ping_fw on kathie3: ok
> Oct 22 11:48:17 kathie3 IPaddr2(HA-IP_1)[1683126]: INFO: ARPING 192.168.16.75 
> from 192.168.16.75 ens3#012Sent 5 probes (5 broadcast(s))#012Received 0 
> response(s)
> Oct 22 11:48:17 kathie3 IPaddr2(HA-IP_2)[1683130]: INFO: ARPING 192.168.16.76 
> from 192.168.16.76 ens3#012Sent 5 probes (5 broadcast(s))#012Received 0 
> response(s)
> Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683138]: INFO: 
> webcontent_data: Called drbdsetup wait-connect-resource webcontent_data 
> --wfc-timeout=5 --degr-wfc-timeout=5 --outdated-wfc-timeout=5
> Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683142]: INFO: 
> webcontent_data: Exit code 5
> Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683146]: INFO: 
> webcontent_data: Command output:
> Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683150]: INFO: 
> webcontent_data: Command stderr:
> Oct 22 11:48:19 kathie3 pacemaker-attrd[4878]: notice: Setting 
> master-Webcontent_DRBD[kathie3] in instance_attributes: (unset) -&gt; 1000
> Oct 22 11:48:19 kathie3 pacemaker-controld[4880]: notice: Result of start 
> operation for Webcontent_DRBD on kathie3: ok
> Oct 22 11:48:19 kathie3 pacemaker-controld[4880]: notice: Initiating notify 
> operation Webcontent_DRBD_post_notify_start_0 locally on kathie3
> ...
>
> Is there some kind of timeout wrong or what am I missing ?
>
> Any suggestions are welcome
>
> Kind regards
>
> fatcharly
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to