Hi!!! DRBD under guest Debian 10 on host Hyper-V 2012 R2 has a panic kernel error!!! Therefore, I decided not to use this solution. The question is closed. Thanks for the support.
Elias Nasonov elias@po-mayak От: users-requ...@clusterlabs.org Отправлено: 20 ноября 2019 г. в 22:00 Кому: users@clusterlabs.org Тема: Users Digest, Vol 58, Issue 22 Send Users mailing list submissions to users@clusterlabs.org To subscribe or unsubscribe via the World Wide Web, visit https://lists.clusterlabs.org/mailman/listinfo/users or, via email, send a message with subject or body 'help' to users-requ...@clusterlabs.org You can reach the person managing the list at users-ow...@clusterlabs.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Users digest..." Today's Topics: 1. Antw: HA - order lost when group made (Ulrich Windl) 2. Antw: Re: Dual Primary DRBD + OCFS2 (elias) (Ulrich Windl) ---------------------------------------------------------------------- Message: 1 Date: Wed, 20 Nov 2019 12:23:49 +0100 From: "Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de> To: <users@clusterlabs.org> Subject: [ClusterLabs] Antw: HA - order lost when group made Message-ID: <5dd52245020000a100035...@gwsmtp.uni-regensburg.de> Content-Type: text/plain; charset=UTF-8 >>> "John Goutbeck" <john.goutb...@newsignal.ca> schrieb am 19.11.2019 um 22:19 in Nachricht <5dd45c62020000f500006...@mx1.newsignal.ca>: > HA ? order lost when group made > > SLES 15 SP1 + HA > > nss?sn02:~ # rpm ?qa | grep pacem > pacemaker?cli?1.1.18+20180430.b12c320f5?3.15.1.x86_64 > pacemaker?1.1.18+20180430.b12c320f5?3.15.1.x86_64 > libpacemaker3?1.1.18+20180430.b12c320f5?3.15.1.x86_64 > nss?sn02:~ # rpm ?qa | grep crm > crmsh?scripts?4.1.0+git.1569593061.35f57072?3.14.1.noarch > crmsh?4.1.0+git.1569593061.35f57072?3.14.1.noarch > > 2 node HA cluster setup for DRBD storage > > Made an order constraint for resource virtual IPs, iSCSI targets and iSCSI > LUs. > > These resources need to be started in order > Te resources can be start individually (and stopped individually) (before > order constraint is made) > > order o_drbd02_before_iscsitgt02 Serialize: p?ip?14?202:start p?ip?15?202:start > p_target_drbd02:start p?lu?drbd02:start > or > order o_drbd02_before_iscsitgt02 Serialize: ( p?ip?14?202:start p?ip?15?202:start > ) ( p_target_drbd02:start ) ( p?lu?drbd02:start ) > > ? > > Now to make a group resource with the same resources, but when the group is > made, the order constraint is gone groups always had implicit colocation and ordering. > > group g?drbd02 p?ip?14?202 p?ip?15?202 p?lu?drbd02 p_target_drbd02 meta > target?role=Stopped > > Adding the group with 'crm configure edit' returns these comments > > nss?sn02:~ # crm configure edit > INFO: modified colocation:cl?drbd02 from p?ip?14?202 to g?drbd02 > INFO: modified order:o_drbd03_before_iscsitgt from p?ip?14?202 to g?drbd02 > INFO: modified colocation:cl?drbd03 from p?ip?14?202 to g?drbd02 > INFO: modified order:o_drbd02_before_iscsitgt02 from p?ip?14?202 to g?drbd02 > INFO: modified order:o_drbd02_before_iscsitgt02 from p?ip?15?202 to g?drbd02 > INFO: modified order:o_drbd02_before_iscsitgt02 from p?lu?drbd02 to g?drbd02 > INFO: modified order:o_drbd02_before_iscsitgt02 from p_target_drbd02 to > g?drbd02 > > How can a order be made of the same group resources? ------------------------------ Message: 2 Date: Wed, 20 Nov 2019 12:29:57 +0100 From: "Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de> To: <users@clusterlabs.org> Subject: [ClusterLabs] Antw: Re: Dual Primary DRBD + OCFS2 (elias) Message-ID: <5dd523b5020000a100035...@gwsmtp.uni-regensburg.de> Content-Type: text/plain; charset=UTF-8 Maybe show what you did. Did DLM start successfully? >>> ???? ??????? <el...@po-mayak.ru> schrieb am 20.11.2019 um 06:12 in Nachricht <20191120051305.052936005F7@iwtm.local>: > Thanks Roger! > > I configured according to the SUSE doc for OCFS2, but DLM resource stop with > error -107 (no interface found). > I think it is necessary to configure the OCFS2 cluster manually, but > correctly do it through the RA Pacemaker. > > Ilya Nasonov > elias@po-mayak > > ??: users-requ...@clusterlabs.org > ??????????: 19 ?????? 2019 ?. ? 19:32 > ????: users@clusterlabs.org > ????: Users Digest, Vol 58, Issue 20 > > Send Users mailing list submissions to > users@clusterlabs.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.clusterlabs.org/mailman/listinfo/users > or, via email, send a message with subject or body 'help' to > users-requ...@clusterlabs.org > > You can reach the person managing the list at > users-ow...@clusterlabs.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Users digest..." > > > Today's Topics: > > 1. Re: Antw: Re: Pacemaker 2.0.3-rc3 now available > (Jehan-Guillaume de Rorthais) > 2. corosync 3.0.1 on Debian/Buster reports some MTU errors > (Jean-Francois Malouin) > 3. Dual Primary DRBD + OCFS2 (???? ???????) > 4. Re: Dual Primary DRBD + OCFS2 (Roger Zhou) > 5. Q: ldirectord and "checktype = external-perl" broken? > (Ulrich Windl) > 6. Q: ocf:pacemaker:ping (Ulrich Windl) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 18 Nov 2019 18:13:57 +0100 > From: Jehan-Guillaume de Rorthais <j...@dalibo.com> > To: Ken Gaillot <kgail...@redhat.com> > Cc: Cluster Labs - All topics related to open-source clustering > welcomed <users@clusterlabs.org> > Subject: Re: [ClusterLabs] Antw: Re: Pacemaker 2.0.3-rc3 now > available > Message-ID: <20191118181357.6899c051@firost> > Content-Type: text/plain; charset=UTF-8 > > On Mon, 18 Nov 2019 10:45:25 -0600 > Ken Gaillot <kgail...@redhat.com> wrote: > >> On Fri, 2019-11-15 at 14:35 +0100, Jehan-Guillaume de Rorthais wrote: >> > On Thu, 14 Nov 2019 11:09:57 -0600 >> > Ken Gaillot <kgail...@redhat.com> wrote: >> > >> > > On Thu, 2019-11-14 at 15:22 +0100, Ulrich Windl wrote: >> > > > > > > Jehan-Guillaume de Rorthais <j...@dalibo.com> schrieb am >> > > > > > > 14.11.2019 um >> > > > >> > > > 15:17 in >> > > > Nachricht <20191114151719.6cbf4e38@firost>: >> > > > > On Wed, 13 Nov 2019 17:30:31 ?0600 >> > > > > Ken Gaillot <kgail...@redhat.com> wrote: >> > > > > ... >> > > > > > A longstanding pain point in the logs has been improved. >> > > > > > Whenever >> > > > > > the >> > > > > > scheduler processes resource history, it logs a warning for >> > > > > > any >> > > > > > failures it finds, regardless of whether they are new or old, >> > > > > > which can >> > > > > > confuse anyone reading the logs. Now, the log will contain >> > > > > > the >> > > > > > time of >> > > > > > the failure, so it's obvious whether you're seeing the same >> > > > > > event >> > > > > > or >> > > > > > not. The log will also contain the exit reason if one was >> > > > > > provided by >> > > > > > the resource agent, for easier troubleshooting. >> > > > > >> > > > > I've been hurt by this in the past and I was wondering what was >> > > > > the >> > > > > point of >> > > > > warning again and again in the logs for past failures during >> > > > > scheduling? >> > > > > What this information brings to the administrator? >> > > >> > > The controller will log an event just once, when it happens. >> > > >> > > The scheduler, on the other hand, uses the entire recorded resource >> > > history to determine the current resource state. Old failures (that >> > > haven't been cleaned) must be taken into account. >> > >> > OK, I wasn't aware of this. If you have a few minutes, I would be >> > interested to >> > know why the full history is needed and not just find the latest >> > entry from >> > there. Or maybe there's some comments in the source code that already >> > cover this question? >> >> The full *recorded* history consists of the most recent operation that >> affects the state (like start/stop/promote/demote), the most recent >> failed operation, and the most recent results of any recurring >> monitors. >> >> For example there may be a failed monitor, but whether the resource is >> considered failed or not would depend on whether there was a more >> recent successful stop or start. Even if the failed monitor has been >> superseded, it needs to stay in the history for display purposes until >> the user has cleaned it up. > > OK, understood. > > Maybe that's why "FAILED" appears shortly in crm_mon during a resource move > on > a clean resource, but with past failures? Maybe I should dig this weird > behavior and wrap up a bug report if I confirm this? > >> > > Every run of the scheduler is completely independent, so it doesn't >> > > know about any earlier runs or what they logged. Think of it like >> > > Frosty the Snowman saying "Happy Birthday!" every time his hat is >> > > put >> > > on. >> > >> > I don't have this ref :) >> >> I figured not everybody would, but it was too fun to pass up :) >> >> The snowman comes to life every time his magic hat is put on, but to >> him each time feels like he's being born for the first time, so he says >> "Happy Birthday!" >> >> https://www.youtube.com/watch?v=1PbWTEYoN8o > > heh :) > >> > > As far as each run is concerned, it is the first time it's seen the >> > > history. This is what allows the DC role to move from node to node, >> > > and >> > > the scheduler to be run as a simulation using a saved CIB file. >> > > >> > > We could change the wording further if necessary. The previous >> > > version >> > > would log something like: >> > > >> > > warning: Processing failed monitor of my-rsc on node1: not running >> > > >> > > and this latest change will log it like: >> > > >> > > warning: Unexpected result (not running: No process state file >> > > found) >> > > was recorded for monitor of my-rsc on node1 at Nov 12 19:19:02 2019 >> > >> > /result/state/ ? >> >> It's the result of a resource agent action, so it could be for example >> a timeout or a permissions issue. > > ok > >> > > I wanted to be explicit about the message being about processing >> > > resource history that may or may not be the first time it's been >> > > processed and logged, but everything I came up with seemed too long >> > > for >> > > a log line. Another possibility might be something like: >> > > >> > > warning: Using my-rsc history to determine its current state on >> > > node1: >> > > Unexpected result (not running: No process state file found) was >> > > recorded for monitor at Nov 12 19:19:02 2019 >> > >> > I better like the first one. >> > >> > However, it feels like implementation details exposed to the world, >> > isn't it? How useful is this information for the end user? What the >> > user can do >> > with this information? There's noting to fix and this is not actually >> > an error >> > of the current running process. >> > >> > I still fail to understand why the scheduler doesn't process the >> > history >> > silently, whatever it finds there, then warn for something really >> > important if >> > the final result is not expected... >> >> From the scheduler's point of view, it's all relevant information that >> goes into the decision making. Even an old failure can cause new >> actions, for example if quorum was not held at the time but has now >> been reached, or if there is a failure-timeout that just expired. So >> any failure history is important to understanding whatever the >> scheduler says needs to be done. >> >> Also, the scheduler is run on the DC, which is not necessarily the node >> that executed the action. So it's useful for troubleshooting to present >> a picture of the whole cluster on the DC, rather than just what's the >> situation on the local node. > > OK, kind of got it. The scheduler need to summarize the chain of event to > define the state of a resource based on the last event. > >> I could see an argument for lowering it from warning to notice, but >> it's a balance between what's most useful during normal operation and >> what's most useful during troubleshooting. > > So in my humble opinion, the messages should definitely be at notice level. > Maybe they should even go to debug level. I never had to troubleshoot a bad > decision from the scheduler because of a bad state summary. > Moreover, if needed, the admin can still study the history from cib backed > up > on disk, isn't it? > > The alternative would be to spit the event chain in details only if the > result > of the summary is different from what the scheduler was expecting? > > > ------------------------------ > > Message: 2 > Date: Mon, 18 Nov 2019 16:31:34 -0500 > From: Jean-Francois Malouin <jean-francois.malo...@bic.mni.mcgill.ca> > To: The Pacemaker Cluster List <users@clusterlabs.org> > Subject: [ClusterLabs] corosync 3.0.1 on Debian/Buster reports some > MTU errors > Message-ID: <20191118213134.huecj2xnbtrtd...@bic.mni.mcgill.ca> > Content-Type: text/plain; charset=us-ascii > > Hi, > > Maybe not directly a pacemaker question but maybe some of you have seen this > problem: > > A 2 node pacemaker cluster running corosync-3.0.1 with dual communication > ring > sometimes reports errors like this in the corosync log file: > > [KNET ] pmtud: PMTUD link change for host: 2 link: 0 from 470 to 1366 > [KNET ] pmtud: PMTUD link change for host: 2 link: 1 from 470 to 1366 > [KNET ] pmtud: Global data MTU changed to: 1366 > [CFG ] Modified entry 'totem.netmtu' in corosync.conf cannot be changed at > run-time > [CFG ] Modified entry 'totem.netmtu' in corosync.conf cannot be changed at > run-time > > Those do not happen very frequenly, once a week or so... > > However the system log on the nodes reports those much more frequently, a > few > times a day: > > Nov 17 23:26:20 node1 corosync[2258]: [KNET ] link: host: 2 link: 1 is > down > Nov 17 23:26:20 node1 corosync[2258]: [KNET ] host: host: 2 (passive) > best link: 0 (pri: 0) > Nov 17 23:26:26 node1 corosync[2258]: [KNET ] rx: host: 2 link: 1 is up > Nov 17 23:26:26 node1 corosync[2258]: [KNET ] host: host: 2 (passive) > best link: 1 (pri: 1) > > Are those to be dismissed or are they indicative of a network > misconfig/problem? > I tried setting 'knet_transport: udpu' in the totem section (the default > value) > but it didn't seem to make a difference...Hard coding netmtu to 1500 and > allowing for longer (10s) token timeout also didn't seem to affect the > issue. > > > Corosync config follows: > > /etc/corosync/corosync.conf > > totem { > version: 2 > cluster_name: bicha > transport: knet > link_mode: passive > ip_version: ipv4 > token: 10000 > netmtu: 1500 > knet_transport: sctp > crypto_model: openssl > crypto_hash: sha256 > crypto_cipher: aes256 > keyfile: /etc/corosync/authkey > interface { > linknumber: 0 > knet_transport: udp > knet_link_priority: 0 > } > interface { > linknumber: 1 > knet_transport: udp > knet_link_priority: 1 > } > } > quorum { > provider: corosync_votequorum > two_node: 1 > # expected_votes: 2 > } > nodelist { > node { > ring0_addr: xxx.xxx.xxx.xxx > ring1_addr: zzz.zzz.zzz.zzx > name: node1 > nodeid: 1 > } > node { > ring0_addr: xxx.xxx.xxx.xxy > ring1_addr: zzz.zzz.zzz.zzy > name: node2 > nodeid: 2 > } > } > logging { > to_logfile: yes > to_syslog: yes > logfile: /var/log/corosync/corosync.log > syslog_facility: daemon > debug: off > timestamp: on > logger_subsys { > subsys: QUORUM > debug: off > } > } > > > ------------------------------ > > Message: 3 > Date: Tue, 19 Nov 2019 13:51:59 +0500 > From: ???? ??????? <el...@po-mayak.ru> > To: " users@clusterlabs.org" <users@clusterlabs.org> > Subject: [ClusterLabs] Dual Primary DRBD + OCFS2 > Message-ID: <20191119085203.2771960014A@iwtm.local> > Content-Type: text/plain; charset="utf-8" > > Hello! > > Configured a cluster (2-node DRBD+DLM+CFS2) and it works. > I heard the opinion that OCFS2 file system is better. Found an old cluster > setup description: > https://wiki.clusterlabs.org/wiki/Dual_Primary_DRBD_%2B_OCFS2 > but as I understand it, o2cb Service is not supported Pacemaker on Debian. > Where can I get the latest information on setting up the OCFS2. > > ? ?????????, > ???? ??????? > elias@po-mayak > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > <https://lists.clusterlabs.org/pipermail/users/attachments/20191119/95e4c791/ > attachment-0001.html> > > ------------------------------ > > Message: 4 > Date: Tue, 19 Nov 2019 10:01:01 +0000 > From: Roger Zhou <zz...@suse.com> > To: "users@clusterlabs.org" <users@clusterlabs.org> > Subject: Re: [ClusterLabs] Dual Primary DRBD + OCFS2 > Message-ID: <572e29b1-4c05-a985-7419-462310d1c...@suse.com> > Content-Type: text/plain; charset="utf-8" > > > On 11/19/19 4:51 PM, ???? ??????? wrote: >> Hello! >> >> Configured a cluster (2-node DRBD+DLM+CFS2) and it works. >> >> I heard the opinion that OCFS2 file system is better. Found an old >> cluster setup >> description:https://wiki.clusterlabs.org/wiki/Dual_Primary_DRBD_%2B_OCFS2 >> >> but as I understand it, o2cb Service is not supported Pacemaker on Debian. >> >> Where can I get the latest information on setting up the OCFS2. > > Probably you can refer to SUSE doc for OCFS2 with Pacemaker [1]. Should > be not much different to adapt to Debian, I feel. > > [1] > https://documentation.suse.com/sle-ha/15-SP1/html/SLE-HA-all/cha-ha-ocfs2.ht > ml > > Cheers, > Roger > > >> >> ? ?????????, >> ???? ??????? >> elias@po-mayak >> >> >> _______________________________________________ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ >> > > ------------------------------ > > Message: 5 > Date: Tue, 19 Nov 2019 14:58:08 +0100 > From: "Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de> > To: <users@clusterlabs.org> > Subject: [ClusterLabs] Q: ldirectord and "checktype = external-perl" > broken? > Message-ID: <5dd3f4f0020000a100035...@gwsmtp.uni-regensburg.de> > Content-Type: text/plain; charset=US-ASCII > > Hi! > > In SLES11 I developed some special check program for ldirectord 3.9.5 in > Perl, but then I discovered that it won't work correctly with "checktype = > external-perl". Changing to "checktype = external" made it work. > Today I played with it in SLES12 SP4 and > ldirectord-4.3.018.a7fb5035-3.25.1.18557.0.PTF.1153889.x86_64, just to > discover that it still does not work. > > So I wonder: Is it really broken all the time, or is there some special > thing to consider that isn't written in the manual page? > > Th effec tobservable is that the weight is set to 0 right after starting > with weight = 1. If it works, the weight is set to 1. > > Regards, > Ulrich > > > > > > ------------------------------ > > Message: 6 > Date: Tue, 19 Nov 2019 15:32:43 +0100 > From: "Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de> > To: <users@clusterlabs.org> > Subject: [ClusterLabs] Q: ocf:pacemaker:ping > Message-ID: <5dd3fd0b020000a100035...@gwsmtp.uni-regensburg.de> > Content-Type: text/plain; charset=US-ASCII > > Hi! > > Seems today I'm digging out old stuff: > I can remeber in 2011 that the documentation for ping's dampen was not very > help ful. I think it still is: > > (RA info) > node connectivity (ocf:pacemaker:ping) > > Every time the monitor action is run, this resource agent records (in the > CIB) the current number of nodes the host can connect to using the system > fping (preferred) or ping tool. > > Parameters (*: required, []: default): > > pidfile (string, [/var/run/ping-ping]): > PID file > > dampen (integer, [5s]): Dampening interval > The time to wait (dampening) further changes occur > > name (string, [pingd]): Attribute name > The name of the attributes to set. This is the name to be used in the > constraints. > > multiplier (integer, [1]): Value multiplier > The number by which to multiply the number of connected ping nodes by > > host_list* (string): Host list > A space separated list of ping nodes to count. > > attempts (integer, [3]): no. of ping attempts > Number of ping attempts, per host, before declaring it dead > > timeout (integer, [2]): ping timeout in seconds > How long, in seconds, to wait before declaring a ping lost > > options (string): Extra Options > A catch all for any other options that need to be passed to ping. > > failure_score (integer): > Resource is failed if the score is less than failure_score. > Default never fails. > > use_fping (boolean, [1]): Use fping if available > Use fping rather than ping, if found. If set to 0, fping > will not be used even if present. > > debug (string, [false]): Verbose logging > Enables to use default attrd_updater verbose logging on every call. > > Operations' defaults (advisory minimum): > > start timeout=60 > stop timeout=20 > monitor timeout=60 interval=10 > --------- > > "The name of the attributes to set.": Why plural ("attributes")? > "The time to wait (dampening) further changes occur": Is this an English > sentence at all? > > Regards, > Ulrich > > > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > > ------------------------------ > > End of Users Digest, Vol 58, Issue 20 > ************************************* ------------------------------ Subject: Digest Footer _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ------------------------------ End of Users Digest, Vol 58, Issue 22 *************************************
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/