On Tue, Oct 22, 2024 at 3:18 PM Testuser SST via Users <users@clusterlabs.org> wrote: > > Hi, > I'm running a 2-node-web-cluster on Almalinux-9, pacemaker 2.1.7, drbd9 and > corosync 3.1. > I have trouble with the promoting and mounting of the drbd-device. After > activating the cluster, > the drbd-device is not getting mounted and is showing quite fast an error > message: > > pacemaker-schedulerd[4879]: warning: Unexpected result (error: Couldn't mount > device [/dev/drbd1] as /mnt/clusterfs) was recorded for start of > Webcontent_FS on ... > pacemaker-schedulerd[4879]: warning: Webcontent_FS cannot run on kathie3 due > to reaching migration threshold (clean up resource to allow again) >
Do you have any ordering constraints between Webcontent_DRBD and Webcontent_FS? > It's like it's trying to mount the device, but the device is not ready yet. > The device is the drbd1 and I'm trying to mount it on /mnt/clusterfs. After > the error occoured, and I do a "pcs resource cleanup" the cluster is able to > mount it. > the drbd-resource is named webcontend_DRBD > the mounted filesystem is named webcontend_FS > All other resources like httpd and HA-IP's working like a charm. > > This is the log from the start of the cluster: > > Oct 22 11:48:12 kathie3 pacemaker-controld[4880]: notice: State transition > S_ELECTION -> S_INTEGRATION > Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start > HA-IP_1 ( kathie3 ) > Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start > HA-IP_2 ( kathie3 ) > Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start > HA-IP_3 ( kathie3 ) > Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start > Webcontent_DRBD:0 ( kathie3 ) > Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start > Webcontent_FS ( kathie3 ) > Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start > ping_fw:0 ( kathie3 ) > Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Calculated > transition 1106, saving inputs in /var/lib/pacemaker/pengine/pe-input-336.bz2 > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating start > operation HA-IP_1_start_0 locally on kathie3 > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating start > operation Webcontent_FS_start_0 locally on kathie3 > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating start > operation ping_fw_start_0 locally on kathie3 > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating start > operation Webcontent_DRBD_start_0 locally on kathie3 > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local > execution of start operation for HA-IP_1 on kathie3 > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local > execution of start operation for ping_fw on kathie3 > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local > execution of start operation for Webcontent_DRBD on kathie3 > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local > execution of start operation for Webcontent_FS on kathie3 > Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_1)[1682892]: INFO: Adding inet address > 192.168.16.75/24 with broadcast address 192.168.16.255 to device ens3 > Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_1)[1682912]: INFO: Bringing device ens3 > up > Oct 22 11:48:13 kathie3 Filesystem(Webcontent_FS)[1682923]: INFO: Running > start for /dev/drbd1 on /mnt/clusterfs > Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_1)[1682929]: INFO: > /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p > /run/resource-agents/send_arp-192.168.16.75 ens3 192.168.16.75 auto not_used > not_used > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data: Starting worker thread > (node-id 0) > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result of start > operation for HA-IP_1 on kathie3: ok > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating monitor > operation HA-IP_1_monitor_30000 locally on kathie3 > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local > execution of monitor operation for HA-IP_1 on kathie3 > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating start > operation HA-IP_2_start_0 locally on kathie3 > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local > execution of start operation for HA-IP_2 on kathie3 > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data: Auto-promote failed: > Need access to UpToDate data (-2) > Oct 22 11:48:13 kathie3 kernel: /dev/drbd1: Can't open blockdev > Oct 22 11:48:13 kathie3 kernel: /dev/drbd1: Can't open blockdev > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: meta-data IO > uses: blk-bio > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: disk( Diskless > -> Attaching ) [attach] > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: Maximum number > of peer devices = 1 > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data: Method to ensure write > ordering: flush > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: drbd_bm_resize > called with capacity == 104854328 > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: resync bitmap: > bits=13106791 words=204794 pages=400 > Oct 22 11:48:13 kathie3 kernel: drbd1: detected capacity change from 0 to > 104854328 > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: size = 50 GB > (52427164 KB) > Oct 22 11:48:13 kathie3 Filesystem(Webcontent_FS)[1683017]: ERROR: Couldn't > mount device [/dev/drbd1] as /mnt/clusterfs > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result of start > operation for Webcontent_FS on kathie3: error (Couldn't mount device > [/dev/drbd1] as /mnt/clusterfs) > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: > Webcontent_FS_start_0@kathie3 output [ blockdev: cannot open /dev/drbd1: No > data available\nmount: /mnt/clusterfs: mount(2) system call failed: No data > available.\nocf-exit-reason:Couldn't mount device [/dev/drbd1] as > /mnt/clusterfs\n ] > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Transition 1106 > aborted by operation Webcontent_FS_start_0 'modify' on kathie3: Event failed > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Transition 1106 > action 37 (Webcontent_FS_start_0 on kathie3): expected 'ok' but got 'error' > Oct 22 11:48:13 kathie3 pacemaker-attrd[4878]: notice: Setting > last-failure-Webcontent_FS#start_0[kathie3] in instance_attributes: (unset) > -> 1729590493 > Oct 22 11:48:13 kathie3 pacemaker-attrd[4878]: notice: Setting > fail-count-Webcontent_FS#start_0[kathie3] in instance_attributes: (unset) > -> INFINITY > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Transition 1106 > aborted by status-1-last-failure-Webcontent_FS.start_0 doing create > last-failure-Webcontent_FS#start_0=1729590493: Transient attribute change > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: bitmap READ of > 400 pages took 34 ms > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: disk( Attaching > -> UpToDate ) [attach] > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: attached to > current UUID: 826E8850CF10C812 > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: Setting exposed > data uuid: 826E8850CF10C812 > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result of monitor > operation for HA-IP_1 on kathie3: ok > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: Starting sender > thread (peer-node-id 1) > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: conn( StandAlone > -> Unconnected ) [connect] > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: Starting > receiver thread (peer-node-id 1) > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: conn( > Unconnected -> Connecting ) [connecting] > Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_2)[1683100]: INFO: Adding inet address > 192.168.16.76/24 with broadcast address 192.168.16.255 to device ens3 > Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_2)[1683106]: INFO: Bringing device ens3 > up > Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_2)[1683112]: INFO: > /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p > /run/resource-agents/send_arp-192.168.16.76 ens3 192.168.16.76 auto not_used > not_used > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result of start > operation for HA-IP_2 on kathie3: ok > Oct 22 11:48:15 kathie3 pacemaker-attrd[4878]: notice: Setting pingd[kathie3] > in instance_attributes: (unset) -> 1000 > Oct 22 11:48:15 kathie3 pacemaker-controld[4880]: notice: Result of start > operation for ping_fw on kathie3: ok > Oct 22 11:48:17 kathie3 IPaddr2(HA-IP_1)[1683126]: INFO: ARPING 192.168.16.75 > from 192.168.16.75 ens3#012Sent 5 probes (5 broadcast(s))#012Received 0 > response(s) > Oct 22 11:48:17 kathie3 IPaddr2(HA-IP_2)[1683130]: INFO: ARPING 192.168.16.76 > from 192.168.16.76 ens3#012Sent 5 probes (5 broadcast(s))#012Received 0 > response(s) > Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683138]: INFO: > webcontent_data: Called drbdsetup wait-connect-resource webcontent_data > --wfc-timeout=5 --degr-wfc-timeout=5 --outdated-wfc-timeout=5 > Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683142]: INFO: > webcontent_data: Exit code 5 > Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683146]: INFO: > webcontent_data: Command output: > Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683150]: INFO: > webcontent_data: Command stderr: > Oct 22 11:48:19 kathie3 pacemaker-attrd[4878]: notice: Setting > master-Webcontent_DRBD[kathie3] in instance_attributes: (unset) -> 1000 > Oct 22 11:48:19 kathie3 pacemaker-controld[4880]: notice: Result of start > operation for Webcontent_DRBD on kathie3: ok > Oct 22 11:48:19 kathie3 pacemaker-controld[4880]: notice: Initiating notify > operation Webcontent_DRBD_post_notify_start_0 locally on kathie3 > ... > > Is there some kind of timeout wrong or what am I missing ? > > Any suggestions are welcome > > Kind regards > > fatcharly > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/