Re: [ClusterLabs] Problem with a new cluster with drbd on AlmaLinux 9

Testuser SST via Users Tue, 22 Oct 2024 09:24:25 -0700

Hi again,

looks like to double apache order Constraint was the problem.


Thanks for the hint !

Kind regards

fatcharly



&gt; Gesendet: Dienstag, 22. Oktober 2024 um 15:44
&gt; Von: "Testuser SST via Users" <users@clusterlabs.org>
&gt; An: arvidj...@gmail.com, users@clusterlabs.org
&gt; CC: "Testuser SST" <fatcha...@gmx.de>
&gt; Betreff: Re: [ClusterLabs] Problem with a new cluster with drbd on 
AlmaLinux 9
&gt;
&gt; Hi Andrei,
&gt; 
&gt; no, this are the only ones:
&gt; 
&gt; 
&gt; Location Constraints:
&gt;   resource 'Apache' (id: location-Apache)
&gt;     Rules:
&gt;       Rule: boolean-op=or score=-INFINITY (id: location-Apache-rule)
&gt;         Expression: pingd lt 1 (id: location-Apache-rule-expr)
&gt;         Expression: not_defined pingd (id: location-Apache-rule-expr-1)
&gt; Colocation Constraints:
&gt;   resource 'Apache' with resource 'HA-IPs' (id: 
colocation-Apache-HA-IPs-INFINITY)
&gt;     score=INFINITY
&gt;   resource 'Apache' with resource 'Webcontent_FS' (id: 
colocation-Apache-Webcontent_FS-INFINITY)
&gt;     score=INFINITY
&gt; Order Constraints:
&gt;   start resource 'HA-IPs' then start resource 'Apache' (id: 
order-HA-IPs-Apache-mandatory)
&gt;   start resource 'Webcontent_FS' then start resource 'Apache' (id: 
order-Webcontent_FS-Apache-mandatory)
&gt; 
&gt; 
&gt; 
&gt; 
&gt; 
&gt; &gt; Gesendet: Dienstag, 22. Oktober 2024 um 15:41
&gt; &gt; Von: "Andrei Borzenkov" <arvidj...@gmail.com>
&gt; &gt; An: "Cluster Labs - All topics related to open-source clustering 
welcomed" <users@clusterlabs.org>
&gt; &gt; CC: "Testuser SST" <fatcha...@gmx.de>
&gt; &gt; Betreff: Re: [ClusterLabs] Problem with a new cluster with drbd on 
AlmaLinux 9
&gt; &gt;
&gt; &gt; On Tue, Oct 22, 2024 at 3:18 PM Testuser SST via Users
&gt; &gt; <users@clusterlabs.org> wrote:
&gt; &gt; &gt;
&gt; &gt; &gt; Hi,
&gt; &gt; &gt; I'm running a 2-node-web-cluster on Almalinux-9, pacemaker 
2.1.7, drbd9 and corosync 3.1.
&gt; &gt; &gt; I have trouble with the promoting and mounting of the 
drbd-device. After activating the cluster,
&gt; &gt; &gt; the drbd-device is not getting mounted and is showing quite fast 
an error message:
&gt; &gt; &gt;
&gt; &gt; &gt; pacemaker-schedulerd[4879]: warning: Unexpected result (error: 
Couldn't mount device [/dev/drbd1] as /mnt/clusterfs) was recorded for start of 
Webcontent_FS on ...
&gt; &gt; &gt; pacemaker-schedulerd[4879]: warning: Webcontent_FS cannot run on 
kathie3 due to reaching migration threshold (clean up resource to allow again)
&gt; &gt; &gt;
&gt; &gt; 
&gt; &gt; Do you have any ordering constraints between Webcontent_DRBD and 
Webcontent_FS?
&gt; &gt; 
&gt; &gt; &gt; It's like it's trying to mount the device, but the device is not 
ready yet.
&gt; &gt; &gt; The device is the drbd1 and I'm trying to mount it on 
/mnt/clusterfs. After the error occoured, and I do a "pcs resource cleanup" the 
cluster is able to mount it.
&gt; &gt; &gt; the drbd-resource is named webcontend_DRBD
&gt; &gt; &gt; the mounted filesystem is named webcontend_FS
&gt; &gt; &gt; All other resources like httpd and HA-IP's working like a charm.
&gt; &gt; &gt;
&gt; &gt; &gt; This is the log from the start of the cluster:
&gt; &gt; &gt;
&gt; &gt; &gt; Oct 22 11:48:12 kathie3 pacemaker-controld[4880]: notice: State 
transition S_ELECTION -&gt; S_INTEGRATION
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: 
Actions: Start      HA-IP_1               (                        kathie3 )
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: 
Actions: Start      HA-IP_2               (                        kathie3 )
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: 
Actions: Start      HA-IP_3               (                        kathie3 )
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: 
Actions: Start      Webcontent_DRBD:0     (                        kathie3 )
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: 
Actions: Start      Webcontent_FS         (                        kathie3 )
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: 
Actions: Start      ping_fw:0             (                        kathie3 )
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: 
Calculated transition 1106, saving inputs in 
/var/lib/pacemaker/pengine/pe-input-336.bz2
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: 
Initiating start operation HA-IP_1_start_0 locally on kathie3
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: 
Initiating start operation Webcontent_FS_start_0 locally on kathie3
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: 
Initiating start operation ping_fw_start_0 locally on kathie3
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: 
Initiating start operation Webcontent_DRBD_start_0 locally on kathie3
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: 
Requesting local execution of start operation for HA-IP_1 on kathie3
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: 
Requesting local execution of start operation for ping_fw on kathie3
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: 
Requesting local execution of start operation for Webcontent_DRBD on kathie3
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: 
Requesting local execution of start operation for Webcontent_FS on kathie3
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_1)[1682892]: INFO: Adding 
inet address 192.168.16.75/24 with broadcast address 192.168.16.255 to device 
ens3
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_1)[1682912]: INFO: 
Bringing device ens3 up
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 Filesystem(Webcontent_FS)[1682923]: 
INFO: Running start for /dev/drbd1 on /mnt/clusterfs
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_1)[1682929]: INFO: 
/usr/libexec/heartbeat/send_arp  -i 200 -r 5 -p 
/run/resource-agents/send_arp-192.168.16.75 ens3 192.168.16.75 auto not_used 
not_used
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data: Starting 
worker thread (node-id 0)
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result 
of start operation for HA-IP_1 on kathie3: ok
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: 
Initiating monitor operation HA-IP_1_monitor_30000 locally on kathie3
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: 
Requesting local execution of monitor operation for HA-IP_1 on kathie3
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: 
Initiating start operation HA-IP_2_start_0 locally on kathie3
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: 
Requesting local execution of start operation for HA-IP_2 on kathie3
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data: 
Auto-promote failed: Need access to UpToDate data (-2)
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 kernel: /dev/drbd1: Can't open blockdev
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 kernel: /dev/drbd1: Can't open blockdev
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: 
meta-data IO uses: blk-bio
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: 
disk( Diskless -&gt; Attaching ) [attach]
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: 
Maximum number of peer devices = 1
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data: Method to 
ensure write ordering: flush
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: 
drbd_bm_resize called with capacity == 104854328
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: 
resync bitmap: bits=13106791 words=204794 pages=400
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 kernel: drbd1: detected capacity change 
from 0 to 104854328
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: 
size = 50 GB (52427164 KB)
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 Filesystem(Webcontent_FS)[1683017]: 
ERROR: Couldn't mount device [/dev/drbd1] as /mnt/clusterfs
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result 
of start operation for Webcontent_FS on kathie3: error (Couldn't mount device 
[/dev/drbd1] as /mnt/clusterfs)
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: 
Webcontent_FS_start_0@kathie3 output [ blockdev: cannot open /dev/drbd1: No 
data available\nmount: /mnt/clusterfs: mount(2) system call failed: No data 
available.\nocf-exit-reason:Couldn't mount device [/dev/drbd1] as 
/mnt/clusterfs\n ]
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: 
Transition 1106 aborted by operation Webcontent_FS_start_0 'modify' on kathie3: 
Event failed
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: 
Transition 1106 action 37 (Webcontent_FS_start_0 on kathie3): expected 'ok' but 
got 'error'
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-attrd[4878]: notice: Setting 
last-failure-Webcontent_FS#start_0[kathie3] in instance_attributes: (unset) 
-&gt; 1729590493
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-attrd[4878]: notice: Setting 
fail-count-Webcontent_FS#start_0[kathie3] in instance_attributes: (unset) -&gt; 
INFINITY
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: 
Transition 1106 aborted by status-1-last-failure-Webcontent_FS.start_0 doing 
create last-failure-Webcontent_FS#start_0=1729590493: Transient attribute change
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: 
bitmap READ of 400 pages took 34 ms
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: 
disk( Attaching -&gt; UpToDate ) [attach]
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: 
attached to current UUID: 826E8850CF10C812
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: 
Setting exposed data uuid: 826E8850CF10C812
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result 
of monitor operation for HA-IP_1 on kathie3: ok
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: 
Starting sender thread (peer-node-id 1)
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: 
conn( StandAlone -&gt; Unconnected ) [connect]
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: 
Starting receiver thread (peer-node-id 1)
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: 
conn( Unconnected -&gt; Connecting ) [connecting]
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_2)[1683100]: INFO: Adding 
inet address 192.168.16.76/24 with broadcast address 192.168.16.255 to device 
ens3
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_2)[1683106]: INFO: 
Bringing device ens3 up
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_2)[1683112]: INFO: 
/usr/libexec/heartbeat/send_arp  -i 200 -r 5 -p 
/run/resource-agents/send_arp-192.168.16.76 ens3 192.168.16.76 auto not_used 
not_used
&gt; &gt; &gt; Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result 
of start operation for HA-IP_2 on kathie3: ok
&gt; &gt; &gt; Oct 22 11:48:15 kathie3 pacemaker-attrd[4878]: notice: Setting 
pingd[kathie3] in instance_attributes: (unset) -&gt; 1000
&gt; &gt; &gt; Oct 22 11:48:15 kathie3 pacemaker-controld[4880]: notice: Result 
of start operation for ping_fw on kathie3: ok
&gt; &gt; &gt; Oct 22 11:48:17 kathie3 IPaddr2(HA-IP_1)[1683126]: INFO: ARPING 
192.168.16.75 from 192.168.16.75 ens3#012Sent 5 probes (5 
broadcast(s))#012Received 0 response(s)
&gt; &gt; &gt; Oct 22 11:48:17 kathie3 IPaddr2(HA-IP_2)[1683130]: INFO: ARPING 
192.168.16.76 from 192.168.16.76 ens3#012Sent 5 probes (5 
broadcast(s))#012Received 0 response(s)
&gt; &gt; &gt; Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683138]: INFO: 
webcontent_data: Called drbdsetup wait-connect-resource webcontent_data 
--wfc-timeout=5 --degr-wfc-timeout=5 --outdated-wfc-timeout=5
&gt; &gt; &gt; Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683142]: INFO: 
webcontent_data: Exit code 5
&gt; &gt; &gt; Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683146]: INFO: 
webcontent_data: Command output:
&gt; &gt; &gt; Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683150]: INFO: 
webcontent_data: Command stderr:
&gt; &gt; &gt; Oct 22 11:48:19 kathie3 pacemaker-attrd[4878]: notice: Setting 
master-Webcontent_DRBD[kathie3] in instance_attributes: (unset) -&gt; 1000
&gt; &gt; &gt; Oct 22 11:48:19 kathie3 pacemaker-controld[4880]: notice: Result 
of start operation for Webcontent_DRBD on kathie3: ok
&gt; &gt; &gt; Oct 22 11:48:19 kathie3 pacemaker-controld[4880]: notice: 
Initiating notify operation Webcontent_DRBD_post_notify_start_0 locally on 
kathie3
&gt; &gt; &gt; ...
&gt; &gt; &gt;
&gt; &gt; &gt; Is there some kind of timeout wrong or what am I missing ?
&gt; &gt; &gt;
&gt; &gt; &gt; Any suggestions are welcome
&gt; &gt; &gt;
&gt; &gt; &gt; Kind regards
&gt; &gt; &gt;
&gt; &gt; &gt; fatcharly
&gt; &gt; &gt;
&gt; &gt; &gt;
&gt; &gt; &gt; _______________________________________________
&gt; &gt; &gt; Manage your subscription:
&gt; &gt; &gt; https://lists.clusterlabs.org/mailman/listinfo/users
&gt; &gt; &gt;
&gt; &gt; &gt; ClusterLabs home: https://www.clusterlabs.org/
&gt; &gt; 
</users@clusterlabs.org></fatcha...@gmx.de></users@clusterlabs.org></arvidj...@gmail.com>
&gt; _______________________________________________
&gt; Manage your subscription:
&gt; https://lists.clusterlabs.org/mailman/listinfo/users
&gt; 
&gt; ClusterLabs home: https://www.clusterlabs.org/
&gt; </fatcha...@gmx.de></users@clusterlabs.org>
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Problem with a new cluster with drbd on AlmaLinux 9

Reply via email to