Re: [ClusterLabs] fence_virt architecture? (was: Re: Still Beginner STONITH Problem)

2020-07-20 Thread Andrei Borzenkov
On Mon, Jul 20, 2020 at 11:45 AM Klaus Wenninger wrote: > On 7/20/20 10:34 AM, Andrei Borzenkov wrote: > > > > >> >> The cpg-configuration sounds interesting as well. Haven't used >> it or looked into the details. Would be interested to hear about &g

Re: [ClusterLabs] Pacemaker Shutdown

2020-07-22 Thread Andrei Borzenkov
On Wed, Jul 22, 2020 at 9:42 AM Harvey Shepherd < harvey.sheph...@aviatnet.com> wrote: > Hi All, > > I'm running Pacemaker 2.0.3 on a two-node cluster, controlling 40+ > resources which are a mixture of clones and other resources that are > colocated with the master instance of certain clones. I'v

Re: [ClusterLabs] pacemaker systemd resource

2020-07-22 Thread Andrei Borzenkov
On Wed, Jul 22, 2020 at 10:59 AM Хиль Эдуард wrote: > Hi there! I have 2 nodes with Pacemaker 2.0.3, corosync 3.0.3 on ubuntu 20 > + 1 qdevice. I want to define new resource as systemd unit *dummy.service > *: > > [Unit] > Description=Dummy > [Service] > Restart=on-failure > StartLimitInterval=20

Re: [ClusterLabs] pacemaker systemd resource

2020-07-22 Thread Andrei Borzenkov
On Wed, Jul 22, 2020 at 4:58 PM Ken Gaillot wrote: > On Wed, 2020-07-22 at 10:59 +0300, Хиль Эдуард wrote: > > Hi there! I have 2 nodes with Pacemaker 2.0.3, corosync 3.0.3 on > > ubuntu 20 + 1 qdevice. I want to define new resource as systemd > > unit dummy.service : > > > > [Unit] > > Descript

Re: [ClusterLabs] pacemaker systemd resource

2020-07-22 Thread Andrei Borzenkov
acemaker-based     [1719] (cib_perform_op)      > info: -- > /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='drbd_docker1']/lrm_rsc_op[@id='drbd_docker1_monitor_6'] > Jul 22 12:38:42 node2.local pacemaker-based     [17

Re: [ClusterLabs] Antw: [EXT] Stonith failing

2020-07-29 Thread Andrei Borzenkov
On Wed, Jul 29, 2020 at 9:01 AM Gabriele Bulfon wrote: > That one was taken from a specific implementation on Solaris 11. > The situation is a dual node server with shared storage controller: both > nodes see the same disks concurrently. > Here we must be sure that the two nodes are not going to

Re: [ClusterLabs] Antw: [EXT] Stonith failing

2020-07-29 Thread Andrei Borzenkov
30.07.2020 08:42, Strahil Nikolov пишет: > You got plenty of options: > - IPMI based fencing like HP iLO, DELL iDRAC > - SCSI-3 persistent reservations (which can be extended to fence the node > when the reservation(s) were removed) > SCSI reservation prevents data corruption due to con

Re: [ClusterLabs] Antw: [EXT] Stonith failing

2020-07-30 Thread Andrei Borzenkov
On Thu, Jul 30, 2020 at 11:29 AM Strahil Nikolov wrote: > > This one links to how to power fence when reservations are removed: > https://access.redhat.com/solutions/4526731 > All of this is RH(CS) specific ___ Manage your subscription: https://lists.

Re: [ClusterLabs] Antw: [EXT] why is node fenced ?

2020-07-30 Thread Andrei Borzenkov
30.07.2020 23:23, Lentes, Bernd пишет: > > > - Am 30. Jul 2020 um 9:28 schrieb Ulrich Windl > ulrich.wi...@rz.uni-regensburg.de: > > "Lentes, Bernd" schrieb am 29.07.2020 >> um >> 17:26 in Nachricht >> <1894379294.27456141.1596036406000.javamail.zim...@helmholtz-muenchen.de>: >>> Hi, >

Re: [ClusterLabs] Automatic recover from split brain ?

2020-08-10 Thread Andrei Borzenkov
08.08.2020 13:10, Adam Cécile пишет: > Hello, > > > I'm experiencing issue with corosync/pacemaker running on Debian Buster. > Cluster has three nodes running in VMWare virtual machine and the > cluster fails when VEEAM backups the virtual machine (I know it's doing > bad things, like freezing co

Re: [ClusterLabs] Automatic recover from split brain ?

2020-08-11 Thread Andrei Borzenkov
11.08.2020 10:34, Adam Cécile пишет: > On 8/11/20 8:48 AM, Andrei Borzenkov wrote: >> 08.08.2020 13:10, Adam Cécile пишет: >>> Hello, >>> >>> >>> I'm experiencing issue with corosync/pacemaker running on Debian Buster. >>> Cluste

Re: [ClusterLabs] Antw: [EXT] Stonith failing

2020-08-16 Thread Andrei Borzenkov
16.08.2020 04:25, Reid Wahl пишет: > > >> - considering that I have both nodes with stonith against the other node, >> once the two nodes can communicate, how can I be sure the two nodes will >> not try to stonith each other? >> > > The simplest option is to add a delay attribute (e.g., delay=10

Re: [ClusterLabs] Antw: [EXT] Stonith failing

2020-08-17 Thread Andrei Borzenkov
17.08.2020 10:06, Klaus Wenninger пишет: >> >>> Alternatively, you can set up corosync-qdevice, using a separate system >>> running qnetd server as a quorum arbitrator. >>> >> Any solution that is based on node suicide is prone to complete cluster >> loss. In particular, in two node cluster with qd

Re: [ClusterLabs] Antw: [EXT] Stonith failing

2020-08-17 Thread Andrei Borzenkov
17.08.2020 23:39, Jehan-Guillaume de Rorthais пишет: > On Mon, 17 Aug 2020 10:19:45 -0500 > Ken Gaillot wrote: > >> On Fri, 2020-08-14 at 15:09 +0200, Gabriele Bulfon wrote: >>> Thanks to all your suggestions, I now have the systems with stonith >>> configured on ipmi. >> >> A word of caution:

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Stonith failing

2020-08-18 Thread Andrei Borzenkov
18.08.2020 10:10, Ulrich Windl пишет: Ken Gaillot schrieb am 17.08.2020 um 17:19 in > Nachricht > <73d6ecf113098a3154a2e7db2e2a59557272024a.ca...@redhat.com>: >> On Fri, 2020‑08‑14 at 15:09 +0200, Gabriele Bulfon wrote: >>> Thanks to all your suggestions, I now have the systems with stonith >

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Stonith failing

2020-08-18 Thread Andrei Borzenkov
18.08.2020 10:35, Ulrich Windl пишет: >>>> Andrei Borzenkov schrieb am 18.08.2020 um 09:24 in > Nachricht <83aba38d-c9ea-1dff-e53b-14a9e0623...@gmail.com>: >> 18.08.2020 10:10, Ulrich Windl пишет: >>>>>> Ken Gaillot sc

Re: [ClusterLabs] Antw: [EXT] Stonith failing

2020-08-18 Thread Andrei Borzenkov
18.08.2020 17:02, Ken Gaillot пишет: > On Tue, 2020-08-18 at 08:21 +0200, Klaus Wenninger wrote: >> On 8/18/20 7:49 AM, Andrei Borzenkov wrote: >>> 17.08.2020 23:39, Jehan-Guillaume de Rorthais пишет: >>>> On Mon, 17 Aug 2020 10:19:45 -0500 >>>> Ken Gaillo

Re: [ClusterLabs] Antw: [EXT] Stonith failing

2020-08-18 Thread Andrei Borzenkov
18.08.2020 22:49, Klaus Wenninger пишет: >>> What I'm not sure about is how watchdog-only sbd would behave as a >>> fail-back method for a regular fence device. Will the cluster wait for >>> the sbd timeout no matter what, or only if the regular fencing fails, >>> or ...? >>> >> Diskless SBD implic

Re: [ClusterLabs] Coming in Pacemaker 2.0.5: better start-up/shutdown coordination with sbd

2020-08-22 Thread Andrei Borzenkov
21.08.2020 21:16, Ken Gaillot пишет: > > Previously at shutdown, sbd determined a clean pacemaker shutdown by > checking whether any resources were running at shutdown. This would > lead to sbd fencing if pacemaker shut down in maintenance mode with > resources active. What conditions lead to it

Re: [ClusterLabs] Behavior of corosync kill

2020-08-25 Thread Andrei Borzenkov
On Tue, Aug 25, 2020 at 10:00 AM Rohit Saini wrote: > > Hi All, > I am seeing the following behavior. Can someone clarify if this is intended > behavior. If yes, then why so? Please let me know if logs are needed for > better clarity. > > 1. Without Stonith: > Continuous corosync kill on master

Re: [ClusterLabs] [ClusterLabs Developers] Fencing with a Quorum Device

2020-08-26 Thread Andrei Borzenkov
I changed list to users because it is general usage question, not development topic. 26.08.2020 23:33, Hayden Pfeiffer пишет: > Hello, > > > I am in the process of configuring fencing in an AWS cluster of two > hosts. I have done so and nodes are correctly fenced when > communication is broken w

Re: [ClusterLabs] SBD fencing not working on my two-node cluster

2020-09-21 Thread Andrei Borzenkov
22.09.2020 02:06, Philippe M Stedman пишет: > Hi Strahil, > > Here is the output of those commands I appreciate the help! > > # crm config show > node 1: ceha03 \ > attributes ethmonitor-ens192=1 > node 2: ceha04 \ > attributes ethmonitor-ens192=1 > (...) > primitive stonith_s

Re: [ClusterLabs] Resources always return to original node

2020-09-26 Thread Andrei Borzenkov
26.09.2020 12:22, Michael Ivanov пишет: > Hallo, > > I have strange problem: when I reset the node on which my resources are > running, > they are correctly migrated to the other node. But when I turn the failed > node > back, then as soon as it is up all resources are returned back to it. I h

Re: [ClusterLabs] Two ethernet adapter within same subnet causing issue on Qdevice

2020-10-01 Thread Andrei Borzenkov
01.10.2020 20:09, Richard Seo пишет: > Hello everyone, > I'm trying to setup a cluster with two hosts: > both have two ethernet adapters all within the same subnet. > I've created resources for an adapter for each hosts. > Here is the example: > Stack: corosync > Current DC: ceha06 (version 2.0.2-1

Re: [ClusterLabs] Two ethernet adapter within same subnet causing issue on Qdevice

2020-10-06 Thread Andrei Borzenkov
05.10.2020 20:55, Richard Seo пишет: > >> Create host route via specific device. > I've looked over the docs, haven't found a way to do this. I've tried > configuring corosync.conf using the specific ip addresses. Could you > specify > how to route to a specific network adapter from a

Re: [ClusterLabs] Avoiding self-fence on RA failure

2020-10-06 Thread Andrei Borzenkov
07.10.2020 06:42, Digimer пишет: > Hi all, > > While developing our program (and not being a production cluster), I > find that when I push broken code to a node, causing the RA to fail to > perform an operation, the node gets fenced. (example below). > > This brings up a question; > > If

Re: [ClusterLabs] ocf:pacemaker:ping every X seconds

2020-10-08 Thread Andrei Borzenkov
09.10.2020 08:21, Rohit Saini пишет: > Hi Team, > I am using ocf:pacemaker:ping resource to check aliveness of a machine > every X seconds. As I understand, monitor interval 'Y' will cause ping to > happen every 'Y' seconds. So, for my case, Y should be equal to X? > I do not see this behavior tho

Re: [ClusterLabs] Adding a node to an active cluster

2020-10-21 Thread Andrei Borzenkov
On Wed, Oct 21, 2020 at 5:03 PM Jiaqi Tian1 wrote: > > Hi, > I'm trying to add a new node into an active pacemaker cluster with resources > up and running. > After steps: > 1. update corosync.conf files among all hosts in cluster including the new > node > 2. copy corosync auth file to the new n

Re: [ClusterLabs] Adding a node to an active cluster

2020-10-21 Thread Andrei Borzenkov
- your changes may be overwritten by pacemaker? > 2. Do you have idea where(which config file) crm_node command retrieves its > data? CIB > Thanks, > Jiaqi Tian > > - Original message - > From: Andrei Borzenkov > Sent by: "Users" > T

Re: [ClusterLabs] Adding a node to an active cluster

2020-10-21 Thread Andrei Borzenkov
21.10.2020 20:47, Strahil Nikolov пишет: > Both SUSE and RedHat provide utilities to add the node without messing with > the configs manually. Which are crmsh and pcs respectively :) > > What is your distro ? > > > Best Regards, > Strahil Nikolov > > > > > > > В сряда, 21 октомври 2020

Re: [ClusterLabs] VirtualDomain does not stop via "crm resource stop" - modify RA ?

2020-10-22 Thread Andrei Borzenkov
22.10.2020 23:29, Lentes, Bernd пишет: > Hi guys, > > ocassionally stopping a VirtualDomain resource via "crm resource stop" does > not work, and in the end the node is fenced, which is ugly. > I had a look at the RA to see what it does. After trying to stop the domain > via "virsh shutdown ..."

Re: [ClusterLabs] VirtualDomain does not stop via "crm resource stop" - modify RA ?

2020-10-23 Thread Andrei Borzenkov
23.10.2020 21:08, Lentes, Bernd пишет: > > Surprisingly if the virsh destroy is successfull the RA waits until the > domain isn't running anymore: > ... > > I need someting like that which waits for some time (maybe 30s) if the domain > nevertheless stops although > "virsh destroy" gaves an er

Re: [ClusterLabs] fence_scsi problem

2020-10-28 Thread Andrei Borzenkov
On Wed, Oct 28, 2020 at 3:18 PM Patrick Vranckx wrote: > > Hi, > > I try yo setup an HA cluster for ZFS. I think fence_scsi is not working > properly. I can reproduce the problem on two kind of hardware: iSCSI and > SAS storage. > > Here is what I did: > > - set up a storage server with 3 iscsi ta

Re: [ClusterLabs] stop a node

2020-11-15 Thread Andrei Borzenkov
15.11.2020 20:00, Guy Przytula пишет: > a question would be : > > we have maintenance to perform on a node of the cluster > > to avoid that the cluster starts the resource that we stopped - we want > to disable a node temporarily - is this possible without deleting the node > Put node in stand

Re: [ClusterLabs] resource management of standby node

2020-11-30 Thread Andrei Borzenkov
On Mon, Nov 30, 2020 at 3:11 PM Ulrich Windl wrote: > > Hi! > > In SLES15 I'm surprised what a standby node does: My guess was that a standby > node would stop all resources and then just "shut up", but it seems it still > tried to place resources and calls monitor operations. > Standby nodes a

Re: [ClusterLabs] Antw: [EXT] Re: resource management of standby node

2020-11-30 Thread Andrei Borzenkov
30.11.2020 17:05, Ulrich Windl пишет: >>>> Andrei Borzenkov schrieb am 30.11.2020 um 14:18 in > Nachricht > : >> On Mon, Nov 30, 2020 at 3:11 PM Ulrich Windl >> wrote: >>> >>> Hi! >>> >>> In SLES15 I'm surprised what a

Re: [ClusterLabs] Q: LVM-activate: "WARNING: You are recommended to activate one LV at a time or use exclusive activation mode."

2020-11-30 Thread Andrei Borzenkov
30.11.2020 15:36, Ulrich Windl пишет: > Hi! > > I configured a shared LVM activation as per instructions (I hope) in SLES15 > SP2. However I get this warning: > LVM-activate(prm_testVG_activate)[57281]: WARNING: You are recommended to > activate one LV at a time or use exclusive activation mode.

Re: [ClusterLabs] Antw: [EXT] Re: Preferred node for a service (not constrained)

2020-12-03 Thread Andrei Borzenkov
On Thu, Dec 3, 2020 at 11:11 AM Ulrich Windl wrote: > > >>> Strahil Nikolov schrieb am 02.12.2020 um 22:42 in > Nachricht <311137659.2419591.1606945369...@mail.yahoo.com>: > > Constraints' values are varying from: > > infinity which equals to score of 100 > > to: > > - infinity which equals t

Re: [ClusterLabs] Can't have 2 nodes as master with galera resource agent

2020-12-11 Thread Andrei Borzenkov
11.12.2020 16:13, Raphael Laguerre пишет: > Hello, > > I'm trying to setup a 2 nodes cluster with 2 galera instances. I use the > ocf:heartbeat:galera resource agent, however, after I create the resource, > only one node appears to be in master role, the other one can't be promoted > and stays

Re: [ClusterLabs] Antw: [EXT] Recoveing from node failure

2020-12-11 Thread Andrei Borzenkov
11.12.2020 18:37, Gabriele Bulfon пишет: > I found I can do this temporarily: >   > crm config property cib-bootstrap-options: no-quorum-policy=ignore >   All two node clusters I remember run with setting forever :) > then once node 2 is up again: >   > crm config property cib-bootstrap-options:

Re: [ClusterLabs] Antw: [EXT] Recoveing from node failure

2020-12-12 Thread Andrei Borzenkov
t;   > Sonicle S.r.l. : http://www.sonicle.com > Music: http://www.gabrielebulfon.com > eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets >   > > > > > -- > > Da: A

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Recoveing from node failure

2020-12-14 Thread Andrei Borzenkov
On Mon, Dec 14, 2020 at 2:40 PM Gabriele Bulfon wrote: > > I isolated the log when everything happens (when I disable the ha interface), > attached here. > And where are matching logs from the second node? ___ Manage your subscription: https://lists.cl

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Recoveing from node failure

2020-12-15 Thread Andrei Borzenkov
gt; Gabriele > > > Sonicle S.r.l. : http://www.sonicle.com > Music: http://www.gabrielebulfon.com > eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets > > > > > ------ > > Da:

Re: [ClusterLabs] Best way to create a floating identity file

2020-12-15 Thread Andrei Borzenkov
On Tue, Dec 15, 2020 at 4:58 PM Tony Stocker wrote: > > I'm trying to figure out the best way to do the following on our > 2-node clusters. > > Whichever node is the primary (all services run on a single node) I > want to create a file that contains an identity descriptor, e.g. > /var/local/projec

Re: [ClusterLabs] Best way to create a floating identity file

2020-12-15 Thread Andrei Borzenkov
15.12.2020 17:10, Tony Stocker пишет: > On Tue, Dec 15, 2020 at 9:02 AM Andrei Borzenkov wrote: >> >> On Tue, Dec 15, 2020 at 4:58 PM Tony Stocker wrote: >>> >> >> You could simply query whether a specific resource (group) is active >> on the nod

Re: [ClusterLabs] Antw: [EXT] delaying start of a resource

2020-12-16 Thread Andrei Borzenkov
16.12.2020 17:56, Gabriele Bulfon пишет: > Thanks, here are the logs, there are infos about how it tried to start > resources on the nodes. Both logs are from the same node. > Keep in mind the node1 was already running the resources, and I simulated a > problem by turning down the ha interface.

Re: [ClusterLabs] Antw: [EXT] delaying start of a resource

2020-12-16 Thread Andrei Borzenkov
16.12.2020 19:05, Gabriele Bulfon пишет: > Looking at the two logs, looks like corosync decided that xst1 was offline, > while xst was still online. > I just issued an "ifconfig ha0 down" on xst1, so I expect both nodes cannot > see other one, while I see these same lines both on xst1 and xst2 lo

Re: [ClusterLabs] Antw: [EXT] delaying start of a resource

2020-12-17 Thread Andrei Borzenkov
On Thu, Dec 17, 2020 at 11:11 AM Gabriele Bulfon wrote: > > Yes, sorry took same bash by mistake...here are the correct logs. > > Yes, xstha1 has delay 10s so that I'm giving him precedence, xstha2 has delay > 1s and will be stonished earlier. > During the short time before xstha2 got powered off

Re: [ClusterLabs] Antw: [EXT] delaying start of a resource

2020-12-17 Thread Andrei Borzenkov
d off. You really need to test how ipmi behaves with your specific hardware to make sure it is not possible or to adjust stonith agent to handle delays. To reiterate: > > Da: Andrei Borzenkov > > It is possible that your IPMI/BMC/whatever implementation responds > with success bef

Re: [ClusterLabs] Antw: Re: Antw: [EXT] delaying start of a resource

2020-12-17 Thread Andrei Borzenkov
17.12.2020 14:02, Ulrich Windl пишет: >>>> Andrei Borzenkov schrieb am 17.12.2020 um 09:50 in > Nachricht > : > > ... >> According to logs from xstha1, it started to activate resources only >> after stonith was confirmed >> >> Dec 16 15

Re: [ClusterLabs] Antw: Re: Antw: [EXT] delaying start of a resource

2020-12-17 Thread Andrei Borzenkov
17.12.2020 21:30, Ken Gaillot пишет: > > This reminded me that some IPMI implementations return "success" for > commands before they've actually been completed. This is why > fence_ipmilan has a "power_wait" parameter that defaults to 2 seconds. > But on this case we also do not know whether com

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] delaying start of a resource

2020-12-17 Thread Andrei Borzenkov
18.12.2020 10:09, Ulrich Windl пишет: >>>> Andrei Borzenkov schrieb am 18.12.2020 um 08:01 in > Nachricht : >> 17.12.2020 21:30, Ken Gaillot пишет: >>> >>> This reminded me that some IPMI implementations return "success" for >>> co

Re: [ClusterLabs] Q: warning: new_event_notification (4527-22416-14): Broken pipe (32)

2020-12-18 Thread Andrei Borzenkov
18.12.2020 12:00, Ulrich Windl пишет: > > Maybe a related question: Do STONITH resources have special rules, meaning > they don't wait for successful fencing? pacemaker resources in CIB do not perform fencing. They only register fencing devices with fenced which does actual job. In particular ..

Re: [ClusterLabs] Running shell command on remote node via corosync messaging infrastructure

2020-12-18 Thread Andrei Borzenkov
18.12.2020 21:54, Ken Gaillot пишет: > On Fri, 2020-12-18 at 17:51 +, Animesh Pande wrote: >> Hello, >> >> Is there a tool that would allow for commands to be run on remote >> nodes in the cluster through the corosync messaging layer? I have a >> cluster configured with multiple corosync commun

Re: [ClusterLabs] Q: When do I need virtlockd?

2021-01-18 Thread Andrei Borzenkov
On Mon, Jan 18, 2021 at 11:55 AM Ulrich Windl wrote: . > > So can someone explan, or direct me to some helpful docs? > Are you aware of https://libvirt.org/kbase/locking.html which links further to virtlockd description? ___ Manage your subscription: ht

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Cluster breaks after pcs unstandby node

2021-01-18 Thread Andrei Borzenkov
On Mon, Jan 18, 2021 at 12:00 PM Steffen Vinther Sørensen wrote: > > Hi, > > I have persistent journal, but 'journalctl -b -1' was empty in this > case, so it might not be optimally configured. And centralized logging > is on the todo list > > > btw. about the fencing, I have set ' HandlePowerKey=

Re: [ClusterLabs] CCIB migration from Pacemaker 1.x to 2.x

2021-01-23 Thread Andrei Borzenkov
23.01.2021 19:10, Sharma, Jaikumar пишет: > Hi guys, > > I'm newbie to high availability clusters, pls excuse me - learning tools > stack (corosync & pacemaker). > > In fact, our high availability solution is based on Debian 9.x (pacemaker 1.x > and corosync 2.x) - which worked as expected. >

Re: [ClusterLabs] CCIB migration from Pacemaker 1.x to 2.x

2021-01-23 Thread Andrei Borzenkov
23.01.2021 19:41, Sharma, Jaikumar пишет: > Thanks Andrei for quick reply. > > >> Which cable? There is corosync network, there is application network, >> there may be backend network, and they may use different adapters. > > only one network where application is running - same corosync layer n

Re: [ClusterLabs] Stopping all nodes causes servers to migrate

2021-01-25 Thread Andrei Borzenkov
On Mon, Jan 25, 2021 at 12:07 PM Jehan-Guillaume de Rorthais wrote: > As actions during a cluster shutdown cannot be handled in the same transition > for each nodes, I usually add a step to disable all resources using property > "stop-all-resources" before shutting down the cluster: > > pcs pro

Re: [ClusterLabs] Disable all resources in a group if one or more of them fail and are unable to reactivate

2021-01-27 Thread Andrei Borzenkov
27.01.2021 19:06, damiano giuliani пишет: > Hi all im pretty new to the clusters, im struggling trying to configure a > bounch of resources and test how they failover.my need is to start and > manage a group of resources as one (in order to archive this a resource > group has been created), and if

Re: [ClusterLabs] Disable all resources in a group if one or more of them fail and are unable to reactivate

2021-01-28 Thread Andrei Borzenkov
27.01.2021 22:03, Ken Gaillot пишет: > > With a group, later members depend on earlier members. If an earlier > member can't run, then no members after it can run. > > However we can't make the dependency go in both directions. If an > earlier member can't run unless a later member is active, and

Re: [ClusterLabs] Problem with systemd socket service (start fails when running already)

2021-01-29 Thread Andrei Borzenkov
29.01.2021 14:19, Ulrich Windl пишет: > Hi! > > I'm having an odd failure using a systemd socket unit controlled by the > cluster. Why do you need socket unit to be controller by cluster in the first place? The whole point of socket unit is to auto-start services on access and that defeats purpo

Re: [ClusterLabs] Peer (slave) node deleting master's transient_attributes

2021-01-30 Thread Andrei Borzenkov
29.01.2021 20:37, Stuart Massey пишет: > Can someone help me with this? > Background: > > "node01" is failing, and has been placed in "maintenance" mode. It > occasionally loses connectivity. > > "node02" is able to run our resources > > Consider the following messages from pacemaker.log on "nod

Re: [ClusterLabs] Antw: [EXT] Re: Problem with systemd socket service (start fails when running already)

2021-01-31 Thread Andrei Borzenkov
On Mon, Feb 1, 2021 at 10:07 AM Ulrich Windl wrote: > > You are saying starting libvirtd does not require the ro and tls socket units > to be started? > So far I am not aware of any service that would *require* socket activation. Socket activation is optimization that allows you to avoid starting

Re: [ClusterLabs] failed migration handled the wrong way

2021-02-01 Thread Andrei Borzenkov
On Mon, Feb 1, 2021 at 12:53 PM Ulrich Windl wrote: > > Hi! > > While fighting to get the wrong configuration, I broke libvirt live-migration > by not enabling the TLS socket. > > When testing to live-migrate a VM from h16 to h18, these are the essential > events: > Feb 01 10:30:10 h16 pacemaker

Re: [ClusterLabs] Antw: [EXT] Re: failed migration handled the wrong way

2021-02-01 Thread Andrei Borzenkov
On Mon, Feb 1, 2021 at 1:59 PM Ulrich Windl wrote: > > But the VM *wasn't* stopped on h16! > I am not sure what you mean here. It was not stopped during migration? Yes, pacemaker knew it and it tried to stop it explicitly when migration failed. It was not stopped when pacemaker tried to stop it?

Re: [ClusterLabs] Antw: [EXT] Re: failed migration handled the wrong way

2021-02-05 Thread Andrei Borzenkov
05.02.2021 12:54, Ulrich Windl пишет: >>>> Ulrich Windl schrieb am 01.02.2021 um 11:59 in Nachricht <6017DF04.888 : > 161 : > 60728>: >>>>> Andrei Borzenkov schrieb am 01.02.2021 um 11:05 in >> Nachricht >> : >>> On Mon, Feb 1, 2021 at 1

Re: [ClusterLabs] Another odd message: pacemaker-fenced[31326]: warning: Can't create a sane reply

2021-02-09 Thread Andrei Borzenkov
09.02.2021 17:00, Ulrich Windl пишет: > Hi! > > I had made a mistake, leading to node h16 to be fenced. After recovery (h16 > had re-joined the cluster) I had stopped the node, reconfigured the network, > then started the node again. > Then I did the same thing (not the unwanted fencing) with h1

Re: [ClusterLabs] Question: 2 node pcs cluster required quorum and separate Heartbeat Network

2021-02-10 Thread Andrei Borzenkov
10.02.2021 21:56, Ben .T.George пишет: > HI > > Is it mandatory for 2 node pcs cluster require a quorum and separate > Heartbeat Network? > Two node cluster by definition cannot use quorum - there is no way to split cluster so that any part have majority votes. You can artificially increase numb

Re: [ClusterLabs] Antw: [EXT] Colocation per site ?

2021-02-19 Thread Andrei Borzenkov
On Fri, Feb 19, 2021 at 10:41 AM Strahil Nikolov wrote: > > DC1: > - nodeA > - nodeB > - nodeC > > DC2: > - nodeD > - nodeE > - nodeF > > DC3: > - majority maker > > I will have 3 VIPs: > VIP1 > VIP2 > VIP3 > > I will have to setup the cluster to: > 1. Find where is the master HANA resource > 2. P

Re: [ClusterLabs] Antw: [EXT] Colocation per site ?

2021-02-19 Thread Andrei Borzenkov
On Fri, Feb 19, 2021 at 2:44 PM Strahil Nikolov wrote: > > > >Do you have a fixed relation between node >pairs and VIPs? I.e. must > >A/D always get VIP1, B/E - VIP2 etc? > > I have to verify it again, but generally speaking - yes , VIP1 is always on > nodeA/D (master), VIP2 on nodeB/E (worker1)

[ClusterLabs] Latest PDF documents have truncated lines

2021-02-19 Thread Andrei Borzenkov
In the latest PDF versions I downloaded recently code samples appear truncated quite often - they do not fit on page. I compared with previous versions I have and they have smaller fonts for code samples so it usually fits. Of course it is still an issue for overly long lines, so wrapping such cod

Re: [ClusterLabs] Latest PDF documents have truncated lines

2021-02-20 Thread Andrei Borzenkov
On Fri, Feb 19, 2021 at 7:48 PM Ken Gaillot wrote: > > On Fri, 2021-02-19 at 17:54 +0300, Andrei Borzenkov wrote: > > In the latest PDF versions I downloaded recently code samples appear > > truncated quite often - they do not fit on page. I compared with > > previous

Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-02-26 Thread Andrei Borzenkov
26.02.2021 19:19, Eric Robinson пишет: > At 5:16 am Pacific time Monday, one of our cluster nodes failed and its mysql > services went down. The cluster did not automatically recover. > > We're trying to figure out: > > > 1. Why did it fail? Pacemaker only registered loss of connection betw

Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-02-26 Thread Andrei Borzenkov
26.02.2021 20:23, Eric Robinson пишет: >> -Original Message- >> From: Digimer >> Sent: Friday, February 26, 2021 10:35 AM >> To: Cluster Labs - All topics related to open-source clustering welcomed >> ; Eric Robinson >> Subject: Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice

Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-02-26 Thread Andrei Borzenkov
On 26.02.2021 21:58, Eric Robinson wrote: >> -Original Message- >> From: Users On Behalf Of Andrei >> Borzenkov >> Sent: Friday, February 26, 2021 11:27 AM >> To: users@clusterlabs.org >> Subject: Re: [ClusterLabs] Our 2-Node Cluster with a

Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-02-26 Thread Andrei Borzenkov
On 27.02.2021 09:05, Eric Robinson wrote: >> -Original Message- >> From: Users On Behalf Of Andrei >> Borzenkov >> Sent: Friday, February 26, 2021 1:25 PM >> To: users@clusterlabs.org >> Subject: Re: [ClusterLabs] Our 2-Node Cluster with a Separate Q

Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-02-27 Thread Andrei Borzenkov
On 27.02.2021 17:08, Eric Robinson wrote: > > I agree, one node is expected to go out of quorum. Still the question is, why > didn't 001db01b take over the services? I just remembered that 001db01b has > services running on it, and those services did not stop, so it seems that > 001db01b did no

Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-02-28 Thread Andrei Borzenkov
On 27.02.2021 22:12, Andrei Borzenkov wrote: > On 27.02.2021 17:08, Eric Robinson wrote: >> >> I agree, one node is expected to go out of quorum. Still the question is, >> why didn't 001db01b take over the services? I just remembered that 001db01b >> has s

Re: [ClusterLabs] [EXTERNAL] - Antw: [EXT] OCF resource agent is not starting up

2021-02-28 Thread Andrei Borzenkov
On 01.03.2021 08:25, Niveditha U wrote: > Hi Team, > > Can ocft be used in place of ocf-tester? > No, it's different tool. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org

Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-03-01 Thread Andrei Borzenkov
On 01.03.2021 12:26, Jan Friesse wrote: >> > > Thanks for digging into logs. I believe Eric is hitting > https://github.com/corosync/corosync-qdevice/issues/10 (already fixed, > but may take some time to get into distributions) - it also contains > workaround. > I tested corosync-qnetd at df3c67

Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-03-01 Thread Andrei Borzenkov
On 01.03.2021 15:45, Jan Friesse wrote: > Andrei, > >> On 01.03.2021 12:26, Jan Friesse wrote: >>> >>> Thanks for digging into logs. I believe Eric is hitting >>> https://github.com/corosync/corosync-qdevice/issues/10 (already fixed, >>> but may take some time to get into distributions) - it

Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-03-03 Thread Andrei Borzenkov
On 01.03.2021 16:45, Jan Friesse wrote: > Andrei, > >> On 01.03.2021 15:45, Jan Friesse wrote: >>> Andrei, >>> On 01.03.2021 12:26, Jan Friesse wrote: >> > > Thanks for digging into logs. I believe Eric is hitting > https://github.com/corosync/corosync-qdevice/issues/10 (alrea

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Re: Q: constrain or delay "probes"?

2021-03-08 Thread Andrei Borzenkov
On 08.03.2021 11:57, Ulrich Windl wrote: Reid Wahl schrieb am 08.03.2021 um 08:42 in Nachricht > : >> Did the "active on too many nodes" message happen right after a probe? If >> so, then it does sound like the probe returned code 0. > > Events were like this (I greatly condensed the logs):

Re: [ClusterLabs] How to use dnsupdate?

2021-03-09 Thread Andrei Borzenkov
On 10.03.2021 04:47, Ross Sponholtz wrote: > Hi, > I've been working with Linux clustering for several years, mostly in Azure. > However I've got a bit of a challenge right now. I'm trying to set up a > "geo-cluster" and would like to direct client machines to one geo or the > other based on D

Re: [ClusterLabs] Order set troubles

2021-03-24 Thread Andrei Borzenkov
On 24.03.2021 20:56, Ken Gaillot wrote: > On Wed, 2021-03-24 at 09:27 +, Strahil Nikolov wrote: >> Hello All, >> >> I have a trouble creating an order set . >> The end goal is to create a 2 node cluster where nodeA will mount >> nfsA , while nodeB will mount nfsB.On top of that a depended clo

Re: [ClusterLabs] Antw: [EXT] Re: Order set troubles

2021-03-25 Thread Andrei Borzenkov
On Thu, Mar 25, 2021 at 10:31 AM Strahil Nikolov wrote: > > Use Case: > > nfsA is shared filesystem for HANA running in site A > nfsB is shared filesystem for HANA running in site B > > clusterized resource of type SAPHanaTopology must run on all systems if the > FS for the HANA is running > An

Re: [ClusterLabs] Antw: [EXT] Re: Order set troubles

2021-03-25 Thread Andrei Borzenkov
es fails to start (systemd racing condition with dnsmasq) >> >> Best Regards, >> Strahil Nikolov >> >> On Thu, Mar 25, 2021 at 12:18, Andrei Borzenkov >> wrote: >> On Thu, Mar 25, 2021 at 10:31 AM Strahil Nikolov >> wrote: >>> >>> Use Ca

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Re: Order set troubles

2021-03-26 Thread Andrei Borzenkov
On Fri, Mar 26, 2021 at 10:17 AM Ulrich Windl wrote: > > >>> Andrei Borzenkov schrieb am 26.03.2021 um 06:19 in > Nachricht <534274b3-a6de-5fac-0ae4-d02c305f1...@gmail.com>: > > On 25.03.2021 21:45, Reid Wahl wrote: > >> FWIW we have this KB article (

Re: [ClusterLabs] ocf-tester always claims failure, even with built-in resource agents?

2021-03-26 Thread Andrei Borzenkov
On 26.03.2021 17:28, Antony Stone wrote: > Hi. > > I've just signed up to the list. I've been using corosync and pacemaker for > several years, mostly under Debian 9, which means: > > corosync 2.4.2 > pacemaker 1.1.16 > > I've recently upgraded a test cluster to Debian 10, which gi

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Re: Order set troubles

2021-03-26 Thread Andrei Borzenkov
On 26.03.2021 22:18, Reid Wahl wrote: > On Fri, Mar 26, 2021 at 6:27 AM Andrei Borzenkov > wrote: > >> On Fri, Mar 26, 2021 at 10:17 AM Ulrich Windl >> wrote: >>> >>>>>> Andrei Borzenkov schrieb am 26.03.2021 um >> 06:19 in >>

Re: [ClusterLabs] Which fence agent is needed for an Apache web server cluster?

2021-03-27 Thread Andrei Borzenkov
On 28.03.2021 02:42, Reid Wahl wrote: > On Sat, Mar 27, 2021 at 4:28 PM Strahil Nikolov > wrote: > >> I had to tune the fence_ipmi recently on some older HPE blades. The >> default settings were working, but also returning some output about >> problems negotiating the cypher. >> As that output co

Re: [ClusterLabs] Antw: [EXT] Colocation per site ?

2021-03-27 Thread Andrei Borzenkov
et) hana_${SID}_vhost attribute for each node and this attribute must be unique and different between two sites. May be worth to look into it. > Best Regards,Strahil Nikolov > > > On Fri, Feb 19, 2021 at 16:51, Andrei Borzenkov wrote: > On Fri, Feb 19, 2021 at 2:44 PM Strahi

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Re: Order set troubles

2021-03-29 Thread Andrei Borzenkov
On 29.03.2021 11:11, Ulrich Windl wrote: >>>> Andrei Borzenkov schrieb am 27.03.2021 um 06:37 in > Nachricht <7c294034-56c3-baab-73c6-7909ab554...@gmail.com>: >> On 26.03.2021 22:18, Reid Wahl wrote: >>> On Fri, Mar 26, 2021 at 6:27 AM Andrei Borzenkov >>

Re: [ClusterLabs] Antw: [EXT] Colocation per site ?

2021-03-29 Thread Andrei Borzenkov
On 29.03.2021 20:12, Ken Gaillot wrote: > On Sun, 2021-03-28 at 09:20 +0300, Andrei Borzenkov wrote: >> On 28.03.2021 07:16, Strahil Nikolov wrote: >>> I didn't mean DC as a designated coordinator, but as a physical >>> Datecenter location. >>> Last ti

Re: [ClusterLabs] Antw: [EXT] Colocation per site ?

2021-03-30 Thread Andrei Borzenkov
On 30.03.2021 17:42, Ken Gaillot wrote: >> >> Colocation does not work, this will force everything on the same node >> where master is active and that is not what we want. > > Nope, you can colocate by node attribute instead of node. > > Colocating by node attribute says "put this resource on a n

Re: [ClusterLabs] Live migration possible with KSM ?

2021-03-30 Thread Andrei Borzenkov
On 30.03.2021 18:16, Lentes, Bernd wrote: > Hi, > > currently i'm reading "Mastering KVM Virtualization", published by Packt > Publishing, a book i can really recommend. > There are some proposals for tuning guests. One is KSM (kernel samepage > merging), which sounds quite interesting. > Especi

Re: [ClusterLabs] Antw: [EXT] Colocation per site ?

2021-03-31 Thread Andrei Borzenkov
On Wed, Mar 31, 2021 at 8:34 AM Strahil Nikolov wrote: > > Damn... I am too hasty. > > It seems that the 2 resources I have already configured are also running on > the master. > > The colocation constraint is like: > > rsc_bkpip3_SAPHana_SID_HDBinst_num with rsc_SAPHana_SID_HDBinst_num-clone >

Re: [ClusterLabs] failure-timeout not working in corosync 2.0.1

2021-03-31 Thread Andrei Borzenkov
On 01.04.2021 00:21, Antony Stone wrote: > On Wednesday 31 March 2021 at 23:11:50, Reid Wahl wrote: > >> Maybe Pacemaker-1 was looser in its handling of resource meta attributes vs >> operation meta attributes. Good question. > > Returning to my suspicion that it's more likely me that simply did

Re: [ClusterLabs] failure-timeout not working in corosync 2.0.1

2021-03-31 Thread Andrei Borzenkov
On 01.04.2021 08:20, Andrei Borzenkov wrote: > On 01.04.2021 00:21, Antony Stone wrote: >> On Wednesday 31 March 2021 at 23:11:50, Reid Wahl wrote: >> >>> Maybe Pacemaker-1 was looser in its handling of resource meta attributes vs >>> operation meta attributes. Goo

<    1   2   3   4   5   6   7   8   >