Re: [ClusterLabs] Pcsd port change after cluster setup

2024-04-15 Thread Strahil Nikolov via Users
The interesting part is that after repeating the process (update the file, stop & start pcsd and pcs host auth ) everything is working fine including the web UI. Best Regards, Strahil Nikolov On Mon, Apr 15, 2024 at 17:20, Strahil Nikolov via Users wrote: Hi All, I need your hel

[ClusterLabs] Pcsd port change after cluster setup

2024-04-15 Thread Strahil Nikolov via Users
Hi All, I need your help to change the pcsd port.I set the port in /etc/sysconfig/pcsd on all nodes:PCSD_PORT=3500 Yet, the daemon is not listening on it. Best Regards, Strahil Nikolov___ Manage your subscription: https://lists.clusterlabs.org/mailman/li

Re: [ClusterLabs] Fencing doesn't work with google-cloud-cli

2024-03-27 Thread Strahil Nikolov via Users
Hi All, I'm sorry for the previous post. Most probably it's not google-cloud-cli as even after downgrading, fencing still doesn't work all the time. Best Regards, Strahil Nikolov В сряда, 27 март 2024 г. в 15:39:06 ч. Гринуич+2, Strahil Nikolov via Users написа:

[ClusterLabs] Fencing doesn't work with google-cloud-cli

2024-03-27 Thread Strahil Nikolov via Users
the resource back to the previous host and "downgraded" the 'google-cloud-cli' package. Best Regards, Strahil Nikolov ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Questionsabout GCP VIP setup

2024-02-28 Thread Strahil Nikolov via Users
Hi Oyvind I found your e-mail in my spam folder.It seems 'gcloud-ra' doesn't exits and it's not needed for the fence agent or the gcp-vpc-move-vip. Best Regards,Strahil Nikolov On Wed, Feb 7, 2024 at 13:26, Oyvind Albrigtsen wrote: On 07/02/24 11:15 +0000, Strahil Nik

Re: [ClusterLabs] pcsd web interface not working on EL 9.3

2024-02-21 Thread Strahil Nikolov via Users
Hi, I didn't see any redirect and I was puzzled.Currently, the firewall is still blocking me and curl-ing it was the only test that came to my mind. Best Regards,Strahil Nikolov On Wed, Feb 21, 2024 at 8:56, Ivan Devat wrote: Hi, the url https://fqdn:2224 redirects to https://fqdn

[ClusterLabs] pcsd web interface not working on EL 9.3

2024-02-19 Thread Strahil Nikolov via Users
Hi All, Is there a specific setup I missed in order to setup the web interface ? Usually, you just login with the hacluster user on https://fqdn:2224 but when I do a curl, I get an empty response. Best Regards,Strahil Nikolov___ Manage your subscription:

[ClusterLabs] Questionsabout GCP VIP setup

2024-02-07 Thread Strahil Nikolov via Users
Hi All, This is my first cluster in the cloud and I have 2 questions that I'm hoping to get a clue. 1. Where I can find the 'gcloud-ra' binary on EL9 system ? I have installed resource-agents-cloud but I can't find it. 2. Is gcp-vpc-move-vip a good approach to setup the VIP ? Best Regards,Strahil

[ClusterLabs] GCP and IP address question

2024-01-26 Thread Strahil Nikolov via Users
ind those resources. Best Regards, Strahil Nikolov ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] [ClusterLabs Developers] How do I install and configure Pacemaker high-availability cluster resource manager?

2022-08-21 Thread Strahil Nikolov via Users
Also both SuSE and Red Hat documentation is quite extensive and can be considered as a good start. Best Regards,Strahil Nikolov  On Wed, Aug 10, 2022 at 18:41, Turritopsis Dohrnii Teo En Ming wrote: On Wed, 10 Aug 2022 at 23:37, Reid Wahl wrote: > > On Wed, Aug 10, 2022 at 8

Re: [ClusterLabs] 2-Node Cluster - fencing with just one node running ?

2022-08-06 Thread Strahil Nikolov via Users
By the way I remember a lot of problems with fence_ilo & fence_ilo_ssh (due to ILO).If you receive timeouts use fence_ipmi (you have to enable IPMI in ILO). Best Regards,Strahil Nikolov  On Thu, Aug 4, 2022 at 23:34, Reid Wahl wrote: ___ Ma

Re: [ClusterLabs] Antw: [EXT] Heads up for ldirectord in SLES12 SP5 "Use of uninitialized value $ip_port in pattern match (m//) at /usr/sbin/ldirectord line 1830"

2022-08-06 Thread Strahil Nikolov via Users
able infra -> not without you explicitly asking for that (and being capable of). Best Regards,Strahil Nikolov  On Thu, Aug 4, 2022 at 9:38, Ulrich Windl wrote: Hi! FYI, here is a copy what I had sent to SUSE support (stating "Because of the very same DNS resolution problem, stopping al

Re: [ClusterLabs] IPaddr2 resource times out and cant be killed

2022-08-01 Thread Strahil Nikolov via Users
In clouds you can't just use VIPs.Use azure-lb resource instead. Best Regards,Strahil Nikolov  On Fri, Jul 29, 2022 at 23:21, Reid Wahl wrote: On Fri, Jul 29, 2022 at 1:02 PM Reid Wahl wrote: > > On Fri, Jul 29, 2022 at 12:52 PM Ross Sponholtz > wrote: > > &g

Re: [ClusterLabs] Fencing for quorum device?

2022-07-16 Thread Strahil Nikolov via Users
Well, you can always make a single-node cluster with the quorum device's host and setup  systemd resource to keep the service up and running.With SBD, that single-node cluster will suicide in case the machine ends in a unresponsive state. Best Regards,Strahil Nikolov  On Fri, Jul 15,

Re: [ClusterLabs] Help understanding recover of promotable resource after a "pcs cluster stop --all"

2022-05-02 Thread Strahil Nikolov via Users
est Regards,Strahil Nikolov On Tue, May 3, 2022 at 0:25, Ken Gaillot wrote: On Mon, 2022-05-02 at 13:11 -0300, Salatiel Filho wrote: > Hi, Ken, here is the info you asked for. > > > # pcs constraint > Location Constraints: >  Resource: fence-server1 >    Disabled on: >

Re: [ClusterLabs] OT: Linstor/DRBD Problem

2022-04-27 Thread Strahil Nikolov via Users
Why do you use Linstor and not DRBD ?As far as I know Linstor is more suitable for Kubernetes/Openshift . Best Regards,Strahil Nikolov On Thu, Apr 28, 2022 at 8:19, Eric Robinson wrote: This is probably off-topic but I’ll try anyway. Do we have any Linstor gurus around here? I’ve read

Re: [ClusterLabs] How many nodes redhat cluster does supports

2022-04-27 Thread Strahil Nikolov via Users
What is the output of 'gfs2_edit -p jindex /dev/shared_vg1/shared_lv1 |grep journal Source: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/configuring_gfs2_file_systems#proc_adding-gfs2-journal-creating-mounting-gfs2 Best Regards,Strahil Nikolov

Re: [ClusterLabs] OCF_TIMEOUT - Does it recover by itself?

2022-04-27 Thread Strahil Nikolov via Users
You can use a meta attribute to expire failures . The attribute name is 'failure-timeout'I have used it for my fencing devices as during the night the network was quite busy. Best Regards,Strahil Nikolov On Tue, Apr 26, 2022 at 23:54, Hayden, Robert via Users wrote: Rob

Re: [ClusterLabs] I_DC_TIMEOUT and node fenced when it joins the cluster

2022-04-15 Thread Strahil Nikolov via Users
Set the corosync token to 1 miliseconds and adjust the consensus as per the man 5 corosync.conf and give it a try. Don't forget to sync the corosync settings among the cluster. Best Regards,Strahil Nikolov On Fri, Apr 15, 2022 at 15:27, vitaly wrote: Hello Everybody. I am s

Re: [ClusterLabs] Antw: [EXT] Re: SAP HANA monitor fails ‑ Error performing operation: No such device or address

2022-04-11 Thread Strahil Nikolov via Users
7; , while nodeA has a score of 10 - once the node joins, Dummy1 will move again to nodeA as the score will be:nodeA = 10, current node = 1 due to stickiness. Keep in mind that for groups, all resources' score sum up before evaluation. Best Regards,Strahil Nikolov On Mon, Apr 11, 2

Re: [ClusterLabs] Antw: [EXT] SAP HANA monitor fails ‑ Error performing operation: No such device or address

2022-04-10 Thread Strahil Nikolov via Users
debug start is doing the described in  https://wiki.clusterlabs.org/wiki/Debugging_Resource_Failures Best Regards,Strahil Nikolov On Mon, Apr 11, 2022 at 7:21, Aj Revelino wrote: Hi Strahil, Yes I went through the documentation from Azure. In fact, we have 6 production clusters running

Re: [ClusterLabs] Antw: [EXT] SAP HANA monitor fails ‑ Error performing operation: No such device or address

2022-04-09 Thread Strahil Nikolov via Users
esource is again 'managed'. When removing maintenance, it's always nice to 'crm_simulate' . One very good article is https://www.suse.com/support/kb/doc/?id=19158 . What is the output of SAPHanaSR-showAttr ? Best Regards,Strahil Nikolov On Sat, Apr

Re: [ClusterLabs] Restarting parent of ordered clone resources on specific node causes restart of all resources in the ordering constraint on all nodes of the cluster

2022-04-08 Thread Strahil Nikolov via Users
You can use 'kind' and 'symmetrical' to control order constraints. The default value for symmetrical is 'true' which means that in order to stop dummy1 , the cluster has to stop dummy1 & dummy2. Best Regards,Strahil Nikolov On Fri, Apr 8, 2022 at 1

Re: [ClusterLabs] Antw: [EXT] Re: Failed migration causing fencing loop

2022-03-31 Thread Strahil Nikolov via Users
least 15-20 min and enable it back ? Best Regards,Strahil Nikolov On Thu, Mar 31, 2022 at 14:02, Ulrich Windl wrote: >>> "Gao,Yan" schrieb am 31.03.2022 um 11:18 in Nachricht <67785c2f-f875-cb16-608b-77d63d9b0...@suse.com>: > On 2022/3/31 9:03, Ulrich Windl

Re: [ClusterLabs] Antw: [EXT] Re: Corosync Transport‑ Knet Vs UDPU

2022-03-28 Thread Strahil Nikolov via Users
Corosync rings are never enough , especially when the network team has such naughty hands. Best Regards,Strahil Nikolov On Mon, Mar 28, 2022 at 16:55, Ulrich Windl wrote: >>> Strahil Nikolov via Users schrieb am 28.03.2022 um 15:49 in Nachricht <1758982440.55908

Re: [ClusterLabs] Corosync Transport- Knet Vs UDPU

2022-03-28 Thread Strahil Nikolov via Users
One huge benefit of the new stack is that you can have 8 corosync rings, which is really powerful. Best Regards,Strahil Nikolov On Mon, Mar 28, 2022 at 9:27, Christine caulfield wrote: On 28/03/2022 03:30, Somanath Jeeva via Users wrote: > Hi , > > I am upgrading from cor

Re: [ClusterLabs] Antw: [EXT] Re: Parsing the output of crm_mon

2022-03-24 Thread Strahil Nikolov via Users
Also xmllint has '--xpath' (unless you are running something as old as RHEL6) and is available on every linux distro. Best Regards,Strahil Nikolov On Mon, Mar 21, 2022 at 15:41, Ken Gaillot wrote: On Mon, 2022-03-21 at 08:27 +0100, Ulrich Windl wrote: > > > > K

Re: [ClusterLabs] constraining multiple cloned resources to the same node

2022-03-15 Thread Strahil Nikolov via Users
You can try creating a dummy resource and colocate all clones with it. Best Regards,Strahil Nikolov On Tue, Mar 15, 2022 at 20:53, john tillman wrote: > On 15.03.2022 19:35, john tillman wrote: >> Hello, >> >> I'm trying to guarantee that all my cloned drbd reso

Re: [ClusterLabs] Cluster timeout

2022-03-09 Thread Strahil Nikolov via Users
illiseconds (at least based on the manpage) Best Regards,Strahil Nikolov On Wed, Mar 9, 2022 at 12:46, FLORAC Thierry wrote: #yiv4997566984 P {margin-top:0;margin-bottom:0;}Hi, I manage an active/passive PostgreSQL cluster using DRBD, LVM, Pacemaker and Corosync on a Debian GNU/

Re: [ClusterLabs] Q: fence_kdump and fence_kdump_send

2022-02-25 Thread Strahil Nikolov via Users
I always used this one for triggering kdump when using sbd:https://www.suse.com/support/kb/doc/?id=19873 On Fri, Feb 25, 2022 at 21:34, Reid Wahl wrote: On Fri, Feb 25, 2022 at 3:47 AM Andrei Borzenkov wrote: > > On Fri, Feb 25, 2022 at 2:23 PM Reid Wahl wrote: > > > > On Fri, Feb 2

Re: [ClusterLabs] Booth ticket multi-site and quorum /Pacemaker

2022-02-25 Thread Strahil Nikolov via Users
third location (even in a cloud nearby). Best Regards,Strahil Nikolov On Fri, Feb 25, 2022 at 20:10, Viet Nguyen wrote: ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlab

Re: [ClusterLabs] Antw: Re: Antw: [EXT] The 2 servers of the cluster randomly reboot almost together

2022-02-22 Thread Strahil Nikolov via Users
Strangely I can't see any timeouts set in   at the example in https://pve.proxmox.com/wiki/Fencing ? Best Regards,Strahil Nikolov On Tue, Feb 22, 2022 at 18:54, Sebastien BASTARD wrote: Hello Strahil, I don't have pcs software (corosync is embedded in proxmox), but I have &quo

Re: [ClusterLabs] Antw: Re: Antw: [EXT] The 2 servers of the cluster randomly reboot almost together

2022-02-22 Thread Strahil Nikolov via Users
fencing is the reboot mechanism pcs status Best Regards,Strahil Nikolov On Tue, Feb 22, 2022 at 16:44, Sebastien BASTARD wrote: Hello Strahil, As I don't know the kind of fencing, here is the current configuration of corosync : logging {   debug: off   to_syslog: yes} nodelist {  

Re: [ClusterLabs] Antw: Re: Antw: [EXT] The 2 servers of the cluster randomly reboot almost together

2022-02-22 Thread Strahil Nikolov via Users
What kind of fencing are you using ? Best Regards,Strahil Nikolov On Tue, Feb 22, 2022 at 15:24, Sebastien BASTARD wrote: Hello Strahil Nikolov, Qdevice is not a vm. It is a Linux Debian, physical server. Best regards. Le mar. 22 févr. 2022 à 14:20, Strahil Nikolov a écrit : Is the

Re: [ClusterLabs] Antw: Re: Antw: [EXT] The 2 servers of the cluster randomly reboot almost together

2022-02-22 Thread Strahil Nikolov via Users
Is the qdevice on a VM ? Best Regards,Strahil Nikolov On Tue, Feb 22, 2022 at 15:03, Sebastien BASTARD wrote: ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org

Re: [ClusterLabs] Help with PostgreSQL Automatic Failover demotion

2022-02-18 Thread Strahil Nikolov via Users
Also,there is a way to tell the cluster to cleanup failures -> failure-timeout  Best Regards,Strahil Nikolov On Sat, Feb 19, 2022 at 1:52, Jehan-Guillaume de Rorthais wrote: Hello, On Fri, 18 Feb 2022 21:44:58 + "Larry G. Mills" wrote: > ... This happened again r

Re: [ClusterLabs] The 2 servers of the cluster randomly reboot almost together

2022-02-17 Thread Strahil Nikolov via Users
Token timeout -> network issue ? Just run a continious ping (with timestamp) and log it into a file (from each host to other host + qdevice ip). Best Regards,Strahil Nikolov On Thu, Feb 17, 2022 at 11:38, Sebastien BASTARD wrote: Hello CoroSync's team ! We currently have a

Re: [ClusterLabs] Q: sbd: Which parameter controls "error: servant_md: slot read failed in servant."?

2022-02-17 Thread Strahil Nikolov via Users
To be honest, I always check  https://documentation.suse.com/sle-ha/15-SP3/html/SLE-HA-all/cha-ha-storage-protect.html#sec-ha-storage-protect-watchdog-timings for sbd and timings. Best Regards,Strahil Nikolov On Wed, Feb 16, 2022 at 19:31, Klaus Wenninger wrote

Re: [ClusterLabs] VirtualDomain + GlusterFS - troubles coming with CentOS 9

2022-02-15 Thread Strahil Nikolov via Users
I haven't heard about removal of the libgfapi, so most probably ot's a packaging issue. The FUSE mount point can be setup via a cloned Filesystem resource and there should be no problems with it and live migration should work. Best Regards,Strahil Nikolov On Tue, Feb 15, 202

Re: [ClusterLabs] Antw: [EXT] Cluster Removing VIP and Not Following Order Constraint

2022-02-11 Thread Strahil Nikolov via Users
ds samo old primary is promoted back, the IP never disappeared while it always started on the correct side (where the master is). Best Regards,Strahil Nikolov On Fri, Feb 11, 2022 at 10:38, Jonno wrote: Hello all, Thank you for your assistance. Below is the config from my lab environment.

Re: [ClusterLabs] Antw: [EXT] Cluster Removing VIP and Not Following Order Constraint

2022-02-11 Thread Strahil Nikolov via Users
Shouldn't you use kind ' Mandatory' and simetrical TRUE ? If true, the reverse of the constraint applies for the opposite action (for example, if B starts after A starts, then B stops before A stops).  Best Regards,Strahil Nikolov On Fri, Feb 11, 2022 at 9:11, Ulri

Re: [ClusterLabs] what is the "best" way to completely shutdown a two-node cluster ?

2022-02-10 Thread Strahil Nikolov via Users
One drawback of that approach is that adding a the resource stop command will also prevent the resources from starting once the UPS gets enough power and start the servers. Of course a script in cron (@reboot) or in systemd can overcome it. Best Regards,Strahil Nikolov On Thu, Feb 10

Re: [ClusterLabs] what is the "best" way to completely shutdown a two-node cluster ?

2022-02-09 Thread Strahil Nikolov via Users
If you gracefully shutdown a node - pacemaker will migrate all resources away  so you need to shut them down simultaneously and all resources should be stopped by the cluster. Shutting down the nodes would be my choice. Best Regards,Strahil Nikolov On Wed, Feb 9, 2022 at 12:52, Lentes

Re: [ClusterLabs] Removing a resource without stopping it

2022-01-29 Thread Strahil Nikolov via Users
I know... and the editor stuff can be bypassed, if the approach works. Best Regards,Strahil Nikolov On Sat, Jan 29, 2022 at 15:43, Digimer wrote:On 2022-01-29 03:16, Strahil Nikolov wrote: I think there is pcs cluster edit --scope=resources (based on memory). Can you try to

Re: [ClusterLabs] Removing a resource without stopping it

2022-01-29 Thread Strahil Nikolov via Users
I think there is pcs cluster edit --scope=resources (based on memory).Can you try to delete it from there ? Best Regards,Strahil Nikolov On Sat, Jan 29, 2022 at 7:12, Digimer wrote: ___ Manage your subscription: https://lists.clusterlabs.org

Re: [ClusterLabs] heads up: Possible VM data corruption upgrading to SLES15 SP3

2022-01-27 Thread Strahil Nikolov via Users
Are you using HA-LVM or CLVM ? Best Regards,Strahil Nikolov On Thu, Jan 27, 2022 at 16:10, Ulrich Windl wrote: Hi! I know this is semi-offtopic, but I think it's important: I've upgraded one cluster node being a Xen host from SLES15 SP2 to SLES15 SP3 using virtual DVD boot

Re: [ClusterLabs] Antw: [EXT] Re: Feedback wanted: Native language support for Pacemaker help output

2022-01-13 Thread Strahil Nikolov via Users
idempotent and build all clusters from a single role/formula, absolutely perfect and without missing components, packages or settings. Best Regards, Strahil Nikolov В четвъртък, 13 януари 2022 г., 19:35:57 Гринуич+2, Ken Gaillot написа: I think the use case is where senior admins c

Re: [ClusterLabs] Antw: [EXT] Re: Feedback wanted: Native language support for Pacemaker help output

2022-01-11 Thread Strahil Nikolov via Users
To be honest, I don't see any benefit.Even if you have the stack translated, when a more complex setup is needed -> you will always have to search in the source/github issues/documentation/mailing list history and rely on English. Best Regards,Strahil Nikolov On Tue, Jan 11, 2022

Re: [ClusterLabs] Which verson of pacemaker/corosync provides crm_feature_set 3.0.10?

2021-11-23 Thread Strahil Nikolov via Users
ages/c/corosync-2.3.5-1.fc23.x86_64.rpm which theoretically should be close enough. P.S.: I couldn't find those versions for Fedora 22, but they seem available for F23. Best Regards, Strahil Nikolov В вторник, 23 ноември 2021 г., 21:11:58 Гринуич+2, vitaly написа: Hello, I a

Re: [ClusterLabs] resource start after network reconnected

2021-11-21 Thread Strahil Nikolov via Users
You are right, but usually when the SBD disk has failed, I always focus on recovering it as soon as possible. Once the disk is recovered and the watcher detects it back - shutting down is possible. And of course disk-based sbd is better than nothing. Best Regards,Strahil Nikolov On Sun

Re: [ClusterLabs] resource start after network reconnected

2021-11-20 Thread Strahil Nikolov via Users
te: On Sat, Nov 20, 2021 at 08:33:26PM +0000, Strahil Nikolov via Users wrote: > You can also use this 3rd node to provide iSCSI and then the SBD will > be disk-full :D . The good thing about this type of setup is that you > do won't need to put location constraints for the 3rd node. W

Re: [ClusterLabs] resource start after network reconnected

2021-11-20 Thread Strahil Nikolov via Users
ween a primary and secondary (a.k.a master-slave ) replication. Best Regards, Strahil Nikolov В петък, 19 ноември 2021 г., 21:46:22 Гринуич+2, john tillman написа: > On Fri, Nov 19, 2021 at 11:26:01AM -0500, john tillman wrote: >> Anyone have any other ideas for a conf

Re: [ClusterLabs] Fence node when network interface goes down

2021-11-15 Thread Strahil Nikolov via Users
Have you tried with ping and a location constraint for avoiding hosts that cannot ping an extrrnal system. Best Regards,Strahil Nikolov On Mon, Nov 15, 2021 at 0:07, S Rogers wrote: Using on-fail=fence is what I initially tried, but it doesn't work unfortunately. It looks like th

Re: [ClusterLabs] drbd nfs slave not working

2021-11-14 Thread Strahil Nikolov via Users
Also, check what 'drbdadm' has to tell you. Both nodes should be in sync, otherwise pacemaker will prevent the failover. Best Regards,Strahil Nikolov On Sun, Nov 14, 2021 at 20:09, Andrei Borzenkov wrote: On 14.11.2021 19:47, Neil McFadyen wrote: > I have a Ubuntu 2

Re: [ClusterLabs] How to globally enable trace log level in pacemaker?

2021-10-31 Thread Strahil Nikolov via Users
At least it's worth trying (/etc/sysconfig/pacemaker):PCMK_trace_files=* Best Regards,Strahil Nikolov On Sun, Oct 31, 2021 at 18:10, Vladislav Bogdanov wrote: ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/

Re: [ClusterLabs] How to globally enable trace log level in pacemaker?

2021-10-31 Thread Strahil Nikolov via Users
Have you checked the options in /etc/sysconfig/pacemaker as recommended in  https://documentation.suse.com/sle-ha/15-SP3/html/SLE-HA-all/app-ha-troubleshooting.html#sec-ha-troubleshooting-log ? Best Regards, Strahil Nikolov В неделя, 31 октомври 2021 г., 13:33:43 ч. Гринуич+2, Andrei

Re: [ClusterLabs] Antw: [EXT] unexpected fenced node and promotion of the new master PAF ‑ postgres

2021-10-09 Thread Strahil Nikolov via Users
Ah... That's the first thing I change.In SLES, that is defaulted to 10s and so far I have never seen an environment that is stable enough for the default 1s timeout. Best Regards,Strahil Nikolov On Sat, Oct 9, 2021 at 9:59, Jehan-Guillaume de Rorthais wrote: Le 9 octobre 2021 00:

Re: [ClusterLabs] Antw: [EXT] unexpected fenced node and promotion of the new master PAF ‑ postgres

2021-10-08 Thread Strahil Nikolov via Users
What do you mean by 1s default timeout ? Best Regards,Strahil Nikolov On Fri, Oct 8, 2021 at 16:02, damiano giuliani wrote: ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https

Re: [ClusterLabs] Antw: [EXT] Re: Problem with high load (IO)

2021-10-05 Thread Strahil Nikolov via Users
These 'dirty' sysctl settings are configureable. For large sequential I/O it's desirable 'dirty' ratio/bytes to be bigger, while for small files/random I/O it's better to be kept low. Best Regards, Strahil Nikolov В вторник, 5 октомври 2021 г., 08:52:20 ч. Гр

Re: [ClusterLabs] Problem with high load (IO)

2021-09-30 Thread Strahil Nikolov via Users
Dif you try the 'ionice -c 2 -n 7 nice cp ' ? Best Regards,Strahil Nikolov On Thu, Sep 30, 2021 at 14:58, Lentes, Bernd wrote: - On Sep 30, 2021, at 3:55 AM, Gang He g...@suse.com wrote: >> >> 1) No problems during this step, the procedure just needs a f

Re: [ClusterLabs] corosync/pacemaker resources start after reboot - incorrect node ID calculated

2021-09-28 Thread Strahil Nikolov via Users
Yeah, it seems I missed the nodeid, so can you try setting the "name: hostname" in the corosync.conf ? Best Regards, Strahil Nikolov В вторник, 28 септември 2021 г., 10:34:41 ч. Гринуич+3, Strahil Nikolov via Users написа: Erm, in my corosync.conf I got also 'na

Re: [ClusterLabs] corosync/pacemaker resources start after reboot - incorrect node ID calculated

2021-09-28 Thread Strahil Nikolov via Users
Erm, in my corosync.conf I got also 'name: the-name-of-the-host' and 'nodeid: ' . I don't see these 2 in your config . Best Regards, Strahil Nikolov В вторник, 28 септември 2021 г., 02:39:20 ч. Гринуич+3, Neitzert, Greg A написа: Hello,   We have an is

Re: [ClusterLabs] Problem with high load (IO)

2021-09-27 Thread Strahil Nikolov via Users
Hey Ken, how should someone set the maintenace via pcs ? Best Regards,Strahil Nikolov On Mon, Sep 27, 2021 at 19:56, Ken Gaillot wrote: On Mon, 2021-09-27 at 12:37 +0200, Lentes, Bernd wrote: > Hi, > > i have a two-node cluster running on SLES 12SP5 with two HP servers > and

Re: [ClusterLabs] Problem with high load (IO)

2021-09-27 Thread Strahil Nikolov via Users
I would use something liek this: ionice -c 2 -n 7 nice cp XXX YYY Best Regards, Strahil Nikolov В понеделник, 27 септември 2021 г., 13:37:31 ч. Гринуич+3, Lentes, Bernd написа: Hi, i have a two-node cluster running on SLES 12SP5 with two HP servers and a common FC SAN. Most of my

Re: [ClusterLabs] 8 node cluster

2021-09-07 Thread Strahil Nikolov via Users
#x27;t drop more than 50% of the nodes simultaneously ? Vest Regards,Strahil Nikolov On Tue, Sep 7, 2021 at 21:08, Antony Stone wrote: On Tuesday 07 September 2021 at 19:37:33, M N S H SNGHL wrote: > I am looking for some suggestions here. I have created an 8 node HA cluster > on m

Re: [ClusterLabs] (no subject)

2021-09-02 Thread Strahil Nikolov via Users
In order to test properly, use firewall rules to drop the corosync traffic.I remember that this test  (ifdown NKC) was inefficient in previous versions of corosync. If you wish to be more safe, try to setup ocf:pacemaker:ping. Best Regards,Strahil Nikolov On Fri, Sep 3, 2021 at 5:09, 重力加速度

Re: [ClusterLabs] Qemu VM resources - cannot acquire state change lock

2021-08-29 Thread Strahil Nikolov via Users
that. Your setup looksfar close to the oVirt project ... (just mentioning). Best Regards,Strahil Nikolov Sent from Yahoo Mail on Android On Sat, Aug 28, 2021 at 13:33, lejeczek via Users wrote: On 26/08/2021 10:35, Klaus Wenninger wrote: > > > On Thu, Aug 26, 2021 at 11:13 AM

Re: [ClusterLabs] Question about automating cluster unfencing.

2021-08-29 Thread Strahil Nikolov via Users
; module as the system is fenced via SAN and even if not rebooted , there is no risk . Best Regards,Strahil Nikolov Sent from Yahoo Mail on Android On Sat, Aug 28, 2021 at 10:14, Andrei Borzenkov wrote: On Fri, Aug 27, 2021 at 8:11 PM Gerry R Sommerville wrote: > > Hey all, > >

Re: [ClusterLabs] Cloned ressource is restarted on all nodes if one node fails

2021-08-09 Thread Strahil Nikolov via Users
> name="statusurl" value="http://localhost/server-status"/> Can you show the apache config for the status page ? It must be accessible only from localhost (127.0.0.1) and should not be reachable from the other nodes. Bes

Re: [ClusterLabs] Cloned ressource is restarted on all nodes if one node fails

2021-08-09 Thread Strahil Nikolov via Users
g and data on a shared FS. Sadly, I can't find my notes right now. Best Regards,Strahil Nikolov On Mon, Aug 9, 2021 at 13:43, Andreas Janning wrote: Hi all, we recently experienced an outage in our pacemaker cluster and I would like to understand how we can configure the cluster to

Re: [ClusterLabs] Sub‑clusters / super‑clusters - working :)

2021-08-07 Thread Strahil Nikolov via Users
>Because Asterisk at cityA is bound to a floating IP address, which is held >onone of the three machines in cityA. I can't run Asterisk on all >threemachines there because only one of them has the IP address. That's not true. You can use a cloned IP resource with 'globally-unique=true' which run

Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-04 Thread Strahil Nikolov via Users
I still can't understand why the whole cluster will fail when only 3 nodes are down and a qdisk is used. CityA -> 3 nodes to run packageA -> 3 votesCityB -> 3 nodes to run packageB -> 3 votesCityC -> 1 node which cannot run any package (qdisk) -> 1 vote Max votes:7Quorum: 4 As long as one city is

Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-04 Thread Strahil Nikolov via Users
That's why you need a qdisk at a 3-rd location, so you will have 7 votes in total.When 3 nodes in cityA die, all resources will be started on the remaining 3 nodes. Best Regards,Strahil Nikolov On Wed, Aug 4, 2021 at 17:23, Antony Stone wrote: On Wednesday 04 August 2021 at 16:

Re: [ClusterLabs] Antw: [EXT] Moving resource only one way

2021-08-04 Thread Strahil Nikolov via Users
1/html/Clusters_from_Scratch/_move_resources_manually.html Best Regards,Strahil Nikolov On Tue, Aug 3, 2021 at 22:16, Ervin Hegedüs wrote: Hi, On Tue, Aug 03, 2021 at 05:46:51PM +, Strahil Nikolov via Users wrote: > Yes.INFINITY= 100 (one million)-INFINITY=-100(negative one mi

Re: [ClusterLabs] Antw: [EXT] Moving resource only one way

2021-08-03 Thread Strahil Nikolov via Users
Yes.INFINITY= 100 (one million)-INFINITY=-100(negative one mill) Set stickiness > 100 . Best Regards,Strahil Nikolov  > The `location` section overwrites the stickiness? ___ Manage your subscription: https://lists.clusterla

Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-03 Thread Strahil Nikolov via Users
Won't something like this work ? Each node in LA will have same score of 5000, while other cities will be -5000. pcs constraint location DummyRes1 rule score=5000 city eq LA pcs constraint location DummyRes1 rule score=-5000 city ne LA stickiness -> 1 Best Regards,Strahil Nikolov

Re: [ClusterLabs] pcs add node command is success but node is not configured to existing cluster

2021-07-28 Thread Strahil Nikolov via Users
Firewall issue ? Did you check on corosync level if all nodes reach each other ? Best Regards, Strahil Nikolov В сряда, 28 юли 2021 г., 16:32:51 ч. Гринуич+3, S Sathish S via Users написа:    Hi Team,   we are trying to add node03 to existing cluster after adding we could see

Re: [ClusterLabs] Antw: Re: [EXT] Re: Two node cluster without fencing and no split brain?

2021-07-28 Thread Strahil Nikolov via Users
So far, I never had a cluster with nodes directly connected to the same switches. Usually it's a nodeA -> switchA -> switchB -> nodeB and sometimes connectivity between switches goes down (for example a firewall rule). Best Regards, Strahil Nikolov В сряда, 28 юли 2021

Re: [ClusterLabs] Two node cluster without fencing and no split brain?

2021-07-20 Thread Strahil Nikolov via Users
Hi, consider using a 3rd system as a Q disk. Also, you can use iscsi from that node as a SBD device, so you will have proper fencing .If you don't have a hardware watchdog device, you can use softdog kernel module for that. Best Regards,Strahil Nikolov On Wed, Jul 21, 2021 at 1:45, Di

Re: [ClusterLabs] Antw: Re: Antw: [EXT] VIP monitor Timed Out

2021-07-20 Thread Strahil Nikolov via Users
ouble. Keep in mind that you can use either dirty_ratio or dirty_bytes and either  dirty_background_ratio or dirty_background_bytes , but never both. Best Regards, Strahil Nikolov В вторник, 20 юли 2021 г., 18:04:36 ч. Гринуич+3, PASERO Florent написа: Thanks Ulrich ! Could you exp

Re: [ClusterLabs] Moving resource only one way

2021-07-16 Thread Strahil Nikolov via Users
Yep, just set the stickiness to something bigger than '0' (max is INFINITY -> 100) Best Regards,Strahil Nikolov On Thu, Jul 15, 2021 at 15:02, Ervin Hegedüs wrote: Hi there, I have to build a very simple cluster with only one resource: a virtual IP. The "challeng

Re: [ClusterLabs] unexpected fenced node and promotion of the new master PAF - postgres

2021-07-14 Thread Strahil Nikolov
If you experience multiple outages, you should consider enabling the kdump feature of sbd. It will increase the takeover time, but might provide valuable info. Best Regards,Strahil Nikolov On Wed, Jul 14, 2021 at 15:12, Klaus Wenninger wrote

Re: [ClusterLabs] QDevice vs 3rd host for majority node quorum

2021-07-13 Thread Strahil Nikolov
In some cases the third location has a single IP and it makes sense to use it as QDevice. If it has multiple network connections to that location - use a full blown node . Best Regards,Strahil Nikolov On Tue, Jul 13, 2021 at 20:44, Andrei Borzenkov wrote: On 13.07.2021 19:52, Gerry R

Re: [ClusterLabs] @ maillist Admins - DMARC (yahoo)

2021-07-12 Thread Strahil Nikolov
Actually, I don't mind but it will be nice if I don't get kicked from time to time due to too many bounces. :) Best Regards,Strahil Nikolov On Sat, 2021-07-10 at 12:34 +0100, lejeczek wrote: > Hi Admins(of this mailing list) > > Could you please fix in DMARC(s) so tho

Re: [ClusterLabs] ZFS Opinions?

2021-07-10 Thread Strahil Nikolov
erformance tuning on both is not a trivial task - as any other performance tuning. Best Regards,Strahil Nikolov On Sat, Jul 10, 2021 at 0:12, Eric Robinson wrote: ___ Manage your subscription: https://lists.clusterlabs.org/mailman/list

Re: [ClusterLabs] VIP monitor Timed Out

2021-07-03 Thread Strahil Nikolov
I would try to add 'trace_ra=1' or 'trace_ra=1 trace_file=' to debug it further. In the first option (without trace_file) , the file will be at  /var/lib/heartbeat/trace_ra//*timestamp Are you sure that the system is not overloaded and can't respond in time ?   Best

Re: [ClusterLabs] Updating quorum configuration without restarting cluster

2021-06-21 Thread Strahil Nikolov
Also, it's worth mentioning that you can still make changes without downtime. For example you can edit corosync conf and push it to all nodes, then set global maintenance, stop the cluster and then start it again. Best Regards,Strahil Nikolov On Mon, Jun 21, 2021 at 9:37, Jan Friesse

Re: [ClusterLabs] Updating quorum configuration without restarting cluster

2021-06-19 Thread Strahil Nikolov
You can reload corosync via 'pcs' and I think that both are supported.The main question is if you did reload corosync on all nodes in the cluster ? Best Regards,Strahil Nikolov On Sat, Jun 19, 2021 at 1:22, Gerry R Sommerville wrote:   Dear community, I would like to ask few

Re: [ClusterLabs] Correctly stop pacemaker on 2-node cluster with SBD and failed devices?

2021-06-15 Thread Strahil Nikolov
Maybe you can try: while true ; do echo '0' > /proc/sys/kernel/nmi_watchdog ; sleep 1 ; done and in another shell stop pacemaker and sbd. I guess the only way to easily reproduce is with sbd over iscsi. Best Regards,Strahil Nikolov On Tue, Jun 15, 2021 at 21:30, Andrei Borzenkov

Re: [ClusterLabs] Correctly stop pacemaker on 2-node cluster with SBD and failed devices?

2021-06-15 Thread Strahil Nikolov
l be triggered. Best Regards, Strahil Nikolov В вторник, 15 юни 2021 г., 18:47:06 ч. Гринуич+3, Andrei Borzenkov написа: On Tue, Jun 15, 2021 at 6:43 PM Strahil Nikolov wrote: > > How did you stop pacemaker ? systemctl stop pacemaker surprise :) > Usually I use 'pcs cl

Re: [ClusterLabs] Correctly stop pacemaker on 2-node cluster with SBD and failed devices?

2021-06-15 Thread Strahil Nikolov
How did you stop pacemaker ?Usually I use 'pcs cluster stop' or it's crm alternative. Best Regards,Strahil Nikolov On Tue, Jun 15, 2021 at 18:21, Andrei Borzenkov wrote: We had the following situation 2-node cluster with single device (just single external storage avai

Re: [ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?

2021-06-15 Thread Strahil Nikolov
Thanks for the update. Could it be something local to your environment ? Have you checked mounting the OCFS2 on a vanilla system ? Best Regards,Strahil Nikolov On Tue, Jun 15, 2021 at 12:01, Ulrich Windl wrote: Hi Guys! Just to keep you informed on the issue: I was informed that I'

Re: [ClusterLabs] A systemd resource monitor is still in progress: re-scheduling

2021-06-13 Thread Strahil Nikolov
Did you notice any delay in 'systemctl status openstack-cinder-scheduler' ? As far as I know the cluster will use systemd (or even maybe dbus) to get the info of the service. Also, 10s monitor intercal seems quite aggressive - have you considered increasing that ? Best Regards,Strah

Re: [ClusterLabs] One Failed Resource = Failover the Cluster?

2021-06-06 Thread Strahil Nikolov
Based on the constraint rules you have mentioned , failure of mysql should not cause a failover to another node. For better insight, you have to be able to reproduce the issue and share the logs with the community. Best Regards,Strahil Nikolov On Sat, Jun 5, 2021 at 23:33, Eric Robinson

Re: [ClusterLabs] One Failed Resource = Failover the Cluster?

2021-06-04 Thread Strahil Nikolov
It shouldn't relocate or affect any other resource,as long as the stop succeeds.If the stop operation times out or fails -> fencing kicks in. Best Regards,Strahil Nikolov___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users

Re: [ClusterLabs] Cluster Stopped, No Messages?

2021-06-01 Thread Strahil Nikolov
Did you configure pacemaker blackbox ? If not, it could be valuable in such cases. Also consider updating as soon as possible. Most probably nobody can count the bug fixes that were introduced between 7.5 and 7.9, nor anyone will be able to help as you are running a pretty outdated version (even

Re: [ClusterLabs] Cluster Stopped, No Messages?

2021-05-28 Thread Strahil Nikolov
I agree -> fencing is mandatory. You can enable the debug logs by editing corosync.conf or /etc/sysconfig/pacemaker. In case simple reload doesn't work, you can set the cluster in global maintenance, stop and then start the stack. Best Regards,Strahil Nikolov On Fri, May 28, 2021

Re: [ClusterLabs] Pacemaker not issuing start command intermittently

2021-05-28 Thread Strahil Nikolov
ra (based on memory -> so use find/locate). Best Regards,Strahil Nikolov On Fri, May 28, 2021 at 22:10, Abithan Kumarasamy wrote: Hello Team, We have been recently running some tests on our Pacemaker clusters that involve two Pacemaker resources on two nodes respectively. The test case

Re: [ClusterLabs] Cluster Stopped, No Messages?

2021-05-28 Thread Strahil Nikolov
what is your fencing agent ? Best Regards,Strahil Nikolov On Thu, May 27, 2021 at 20:52, Eric Robinson wrote: We found one of our cluster nodes down this morning. The server was up but cluster services were not running. Upon examination of the logs, we found that the cluster just

Re: [ClusterLabs] OCFS2 fragmentation with snapshots

2021-05-18 Thread Strahil Nikolov
hen make a snapshot via your Virtualization tech stack. Best Regards,Strahil Nikolov On Tue, May 18, 2021 at 13:52, Ulrich Windl wrote: Hi! I thought using the reflink feature of OCFS2 would be just a nice way to make crash-consistent VM snapshots while they are running. As it is a bit tri

  1   2   3   >