I have just built a test cluster (centOS 8.3) for testing DRBD and it works quite fine.Actually I followed my notes from https://forums.centos.org/viewtopic.php?t=65539 with the exception of point 8 due to the "promotable" stuff. I'm attaching the output of 'pcs cluster cib file' and I hope it helps you fix your issue. Best Regards,Strahil Nikolov
В 09:32 -0500 на 19.01.2021 (вт), Stuart Massey написа: > Ulrich,Thank you for that observation. We share that concern. > We have 4 ea 1G nics active, bonded in pairs. One bonded pair serves > the "public" (to the intranet) IPs, and the other bonded pair is > private to the cluster, used for drbd replication. HA will, I hope, > be using the "public" IP, since that is the route to the IP addresses > resolved for the host names; that will certainly be the only route to > the quorum device. I can say that this cluster has run reasonably > well for quite some time with this configuration prior to the > recently developed hardware issues on one of the nodes. > Regards, > Stuart > > On Tue, Jan 19, 2021 at 2:49 AM Ulrich Windl < > ulrich.wi...@rz.uni-regensburg.de> wrote: > > >>> Stuart Massey <djangosc...@gmail.com> schrieb am 19.01.2021 um > > 04:46 in > > > > Nachricht > > > > <cabq68nqutyyxcygwcupg5txxajjwhsp+c6gcokfowgyrqsa...@mail.gmail.com > > >: > > > > > So, we have a 2-node cluster with a quorum device. One of the > > nodes (node1) > > > > > is having some trouble, so we have added constraints to prevent > > any > > > > > resources migrating to it, but have not put it in standby, so > > that drbd in > > > > > secondary on that node stays in sync. The problems it is having > > lead to OS > > > > > lockups that eventually resolve themselves - but that causes it > > to be > > > > > temporarily dropped from the cluster by the current master > > (node2). > > > > > Sometimes when node1 rejoins, then node2 will demote the drbd ms > > resource. > > > > > That causes all resources that depend on it to be stopped, > > leading to a > > > > > service outage. They are then restarted on node2, since they > > can't run on > > > > > node1 (due to constraints). > > > > > We are having a hard time understanding why this happens. It > > seems like > > > > > there may be some sort of DC contention happening. Does anyone > > have any > > > > > idea how we might prevent this from happening? > > > > > > > > I think if you are routing high-volume DRBD traffic throuch "the > > same pipe" as the cluster communication, cluster communication may > > fail if the pipe is satiated. > > > > I'm not happy with that, but it seems to be that way. > > > > > > > > Maybe running a combination of iftop and iotop could help you > > understand what's going on... > > > > > > > > Regards, > > > > Ulrich > > > > > > > > > Selected messages (de-identified) from pacemaker.log that > > illustrate > > > > > suspicion re DC confusion are below. The update_dc and > > > > > abort_transition_graph re deletion of lrm seem to always precede > > the > > > > > demotion, and a demotion seems to always follow (when not already > > demoted). > > > > > > > > > > Jan 18 16:52:17 [21938] node02.example.com crmd: info: > > > > > do_dc_takeover: Taking over DC status for this partition > > > > > Jan 18 16:52:17 [21938] node02.example.com crmd: info: > > update_dc: > > > > > Set DC to node02.example.com (3.0.14) > > > > > Jan 18 16:52:17 [21938] node02.example.com crmd: info: > > > > > abort_transition_graph: Transition aborted by deletion of > > > > > lrm[@id='1']: Resource state removal | cib=0.89.327 > > > > > source=abort_unless_down:357 > > > > > path=/cib/status/node_state[@id='1']/lrm[@id='1'] complete=true > > > > > Jan 18 16:52:19 [21937] node02.example.com pengine: info: > > > > > master_color: ms_drbd_ourApp: Promoted 0 instances of a possible > > 1 to > > > > > master > > > > > Jan 18 16:52:19 [21937] node02.example.com pengine: notice: > > LogAction: > > > > > * Demote drbd_ourApp:1 ( Master -> Slave > > > > > node02.example.com ) > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Manage your subscription: > > > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > > > > > ClusterLabs home: https://www.clusterlabs.org/ > > > > _______________________________________________Manage your > subscription:https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/
drbd_cib_el83.xml
Description: XML document
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/