Re: [ClusterLabs] What's the number in "Servant pcmk is outdated (age: 682915)"

2022-06-01 Thread Gao,Yan via Users
Hi Ulrich, On 2022/6/1 7:59, Ulrich Windl wrote: Hi! I'm wondering what the number in parentheses is for these messages: sbd[6809]: warning: inquisitor_child: pcmk health check: UNHEALTHY sbd[6809]: warning: inquisitor_child: Servant pcmk is outdated (age: 682915) As we know, each sbd

Re: [ClusterLabs] Failed migration causing fencing loop

2022-05-25 Thread Gao,Yan via Users
Hi Ulrich, On 2022/3/31 11:18, Gao,Yan via Users wrote: On 2022/3/31 9:03, Ulrich Windl wrote: Hi! I just wanted to point out one thing that hit us with SLES15 SP3: Some failed live VM migration causing node fencing resulted in a fencing loop, because of two reasons: 1) Pacemaker thinks

Re: [ClusterLabs] More pacemaker oddities while stopping DC

2022-05-25 Thread Gao,Yan via Users
On 2022/5/25 8:10, Ulrich Windl wrote: Hi! We are still suffering from kernel RAM corruption on the Xen hypervisor when a VM or the hypervisor is doing I/O (three months since the bug report at SUSE, but no fix or workaround meaning the whole Xen cluster project was canceled after 20 years,

Re: [ClusterLabs] Antw: Instable SLES15 SP3 kernel

2022-04-27 Thread Gao,Yan via Users
Hi Ulrich, On 2022/4/27 11:13, Ulrich Windl wrote: Update for the Update: I had installed SLES Updates in one VM and rebooted it via cluster. While installing the updates in the VM the Xen host got RAM corruption (it seems any disk I/O on the host, either locally or via a VM image causes RAM

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Re: Failed migration causing fencing loop

2022-04-04 Thread Gao,Yan via Users
On 2022/4/4 8:58, Ulrich Windl wrote: Andrei Borzenkov schrieb am 04.04.2022 um 06:39 in Nachricht : On 31.03.2022 14:02, Ulrich Windl wrote: "Gao,Yan" schrieb am 31.03.2022 um 11:18 in Nachricht <67785c2f‑f875‑cb16‑608b‑77d63d9b0...@suse.com>: On 2022/3/31 9:03, Ulrich

Re: [ClusterLabs] Failed migration causing fencing loop

2022-03-31 Thread Gao,Yan via Users
On 2022/3/31 9:03, Ulrich Windl wrote: Hi! I just wanted to point out one thing that hit us with SLES15 SP3: Some failed live VM migration causing node fencing resulted in a fencing loop, because of two reasons: 1) Pacemaker thinks that even _after_ fencing there is some migration to "clean

Re: [ClusterLabs] weird xml snippet in "crm configure show"

2021-02-12 Thread Gao,Yan
Hi, On 2021/2/12 11:05, Lentes, Bernd wrote: Hi, i have problems with a configured alert which does not alert anymore. I played a bit around with it and changed several times the configuration with cibadmin. Sometimes i had trouble with the admin_epoch, sometimes with the scheme. When i

Re: [ClusterLabs] Q: List resources affected by utilization limits

2021-01-13 Thread Gao,Yan
On 1/13/21 9:14 AM, Ulrich Windl wrote: Hi! I had made a test: I had configured RAM requirements for some test VMs together with node RAM capacities. Things were running fine. Then as a test I reduced the RAM capacity of all nodes, and test VMs were stopped due to not enough RAM. Now I

Re: [ClusterLabs] "crm verify": ".. stonith-watchdog-timeout is nonzero"

2020-11-26 Thread Gao,Yan
On 11/26/20 8:31 AM, Ulrich Windl wrote: Hi! Using SBD, I got this message from crm's top-level "verify": crm(live/h16)# verify Current cluster status: Online: [ h16 h18 h19 ] prm_stonith_sbd(stonith:external/sbd): Started h18 (unpack_config) notice: Watchdog will be used

Re: [ClusterLabs] Antw: [EXT] Re: Coming in Pacemaker 2.0.4: fencing delay based on what resources are where

2020-03-23 Thread Gao,Yan
On 2020/3/23 14:04, Gao,Yan wrote: On 2020/3/23 8:00, Ulrich Windl wrote: Andrei Borzenkov schrieb am 21.03.2020 um 18:22 in Nachricht <14318_1584811393_5E764D80_14318_174_1_6ab730d7-8cf0-2c7d-7ae5-8d0ea8402758@gmai .com>: 21.03.2020 20:07, Ken Gaillot пишет: Hi all, I am

Re: [ClusterLabs] Antw: [EXT] Re: Coming in Pacemaker 2.0.4: fencing delay based on what resources are where

2020-03-23 Thread Gao,Yan
On 2020/3/23 8:00, Ulrich Windl wrote: Andrei Borzenkov schrieb am 21.03.2020 um 18:22 in Nachricht <14318_1584811393_5E764D80_14318_174_1_6ab730d7-8cf0-2c7d-7ae5-8d0ea8402758@gmai .com>: 21.03.2020 20:07, Ken Gaillot пишет: Hi all, I am happy to announce a feature that was discussed on

Re: [ClusterLabs] Antw: [EXT] Coming in Pacemaker 2.0.4: fencing delay based on what resources are where

2020-03-23 Thread Gao,Yan
I'd like to recognize the primary authors of the 2.0.4 features announced so far: - shutdown locks: myself - switch to clock_gettime() for monotonic clock: Jan Pokorný - crm_mon --include/--exclude: Chris Lumens - priority-fencing-delay: Gao,Yan -- Ken Gaillot ___

Re: [ClusterLabs] Coming in Pacemaker 2.0.4: fencing delay based on what resources are where

2020-03-22 Thread Gao,Yan
ties apply (pcmk_delay_base, etc.). I'd like to recognize the primary authors of the 2.0.4 features announced so far: - shutdown locks: myself - switch to clock_gettime() for monotonic clock: Jan Pokorný - crm_mon --include/--exclude: Chris Lumens - priority-fencing-dela

Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue

2019-02-12 Thread Gao,Yan
ch for good in case it's split-brain. This already works correctly with the fix in regard of 2-node cluster from Klaus. Regards, Yan Many Thanks!!! Reagards Fulong -------- *From:* Gao,Yan *Sent:* Thursday, January 3

Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue

2019-02-12 Thread Gao,Yan
- All topics related to open-source clustering welcomed; Fulong Wang; Gao,Yan *Subject:* Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue On 02/11/2019 09:49 AM, Fulong Wang wrote: Thanks Yan, You gave me more valuable hints on the SBD operation! Now, i can see the verbose output after

Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue

2019-01-03 Thread Gao,Yan
. Given that it was done by Yan Gao iirc I'd assume it went into SLES. So changing the verbosity of the sbd-daemon might get you back these logs. Do you mean commit 2dbdee29736fcbf0fe1d41c306959b22d05f72b0 Author: Gao,Yan Date: Mon Apr 30 18:02:04 2018 +0200 Log: upgrade important messages

Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue

2019-01-03 Thread Gao,Yan
it will not tolerate any further faults. Please repair the system before continuing." Regards, Yan what's your recommendation for this scenario? The "crm node fence"  did the work. Regards Fulong --

Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue

2018-12-21 Thread Gao,Yan
First thanks for your reply, Klaus! On 2018/12/21 10:09, Klaus Wenninger wrote: On 12/21/2018 08:15 AM, Fulong Wang wrote: Hello Experts, I'm New to this mail lists. Pls kindlyforgive me if this mail has disturb you! Our Company recently is evaluating the usage of the SuSE HAE on x86

Re: [ClusterLabs] Wrong sbd.service dependencies

2017-12-17 Thread Gao,Yan
On 2017/12/16 16:59, Andrei Borzenkov wrote: 04.12.2017 21:55, Andrei Borzenkov пишет: ... I tried it (on openSUSE Tumbleweed which is what I have at hand, it has SBD 1.3.0) and with SBD_DELAY_START=yes sbd does not appear to watch disk at all. It simply waits that long on startup before

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: pacemaker with sbd fails to start if node reboots too fast.

2017-12-05 Thread Gao,Yan
On 12/05/2017 03:11 PM, Ulrich Windl wrote: "Gao,Yan" <y...@suse.com> schrieb am 05.12.2017 um 15:04 in Nachricht <f3433dca-d654-0eac-80d6-2f92aeb3e...@suse.com>: On 12/05/2017 12:41 PM, Ulrich Windl wrote: "Gao,Yan" <y...@suse.com> schrieb am 01.12

Re: [ClusterLabs] Antw: Re: Antw: Re: pacemaker with sbd fails to start if node reboots too fast.

2017-12-05 Thread Gao,Yan
On 12/05/2017 12:41 PM, Ulrich Windl wrote: "Gao,Yan" <y...@suse.com> schrieb am 01.12.2017 um 20:36 in Nachricht <e49f3c0a-6981-3ab4-a0b0-1e5f49f34...@suse.com>: On 11/30/2017 06:48 PM, Andrei Borzenkov wrote: 30.11.2017 16:11, Klaus Wenninger пишет: On 11/30/2017

Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-12-05 Thread Gao,Yan
On 12/05/2017 08:57 AM, Dejan Muhamedagic wrote: On Mon, Dec 04, 2017 at 09:55:46PM +0300, Andrei Borzenkov wrote: 04.12.2017 14:48, Gao,Yan пишет: On 12/02/2017 07:19 PM, Andrei Borzenkov wrote: 30.11.2017 13:48, Gao,Yan пишет: On 11/22/2017 08:01 PM, Andrei Borzenkov wrote: SLES12 SP2

Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-12-05 Thread Gao,Yan
On 12/04/2017 07:55 PM, Andrei Borzenkov wrote: 04.12.2017 14:48, Gao,Yan пишет: On 12/02/2017 07:19 PM, Andrei Borzenkov wrote: 30.11.2017 13:48, Gao,Yan пишет: On 11/22/2017 08:01 PM, Andrei Borzenkov wrote: SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with VM on VSphere

Re: [ClusterLabs] Antw: Re: pacemaker with sbd fails to start if node reboots too fast.

2017-12-04 Thread Gao,Yan
On 12/02/2017 08:30 AM, Andrei Borzenkov wrote: 01.12.2017 22:36, Gao,Yan пишет: On 11/30/2017 06:48 PM, Andrei Borzenkov wrote: 30.11.2017 16:11, Klaus Wenninger пишет: On 11/30/2017 01:41 PM, Ulrich Windl wrote: "Gao,Yan" <y...@suse.com> schrieb am 30.11.2017 um 11

Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-12-04 Thread Gao,Yan
On 12/02/2017 07:19 PM, Andrei Borzenkov wrote: 30.11.2017 13:48, Gao,Yan пишет: On 11/22/2017 08:01 PM, Andrei Borzenkov wrote: SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with VM on VSphere using shared VMDK as SBD. During basic tests by killing corosync and forcing

Re: [ClusterLabs] Antw: Re: pacemaker with sbd fails to start if node reboots too fast.

2017-12-01 Thread Gao,Yan
On 11/30/2017 06:48 PM, Andrei Borzenkov wrote: 30.11.2017 16:11, Klaus Wenninger пишет: On 11/30/2017 01:41 PM, Ulrich Windl wrote: "Gao,Yan" <y...@suse.com> schrieb am 30.11.2017 um 11:48 in Nachricht <e71afccc-06e3-97dd-c66a-1b4bac550...@suse.com>: On 11/22

Re: [ClusterLabs] Antw: Re: pacemaker with sbd fails to start if node reboots too fast.

2017-12-01 Thread Gao,Yan
On 11/30/2017 01:41 PM, Ulrich Windl wrote: "Gao,Yan" <y...@suse.com> schrieb am 30.11.2017 um 11:48 in Nachricht <e71afccc-06e3-97dd-c66a-1b4bac550...@suse.com>: On 11/22/2017 08:01 PM, Andrei Borzenkov wrote: SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; tw

Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-11-30 Thread Gao,Yan
On 11/22/2017 08:01 PM, Andrei Borzenkov wrote: SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with VM on VSphere using shared VMDK as SBD. During basic tests by killing corosync and forcing STONITH pacemaker was not started after reboot. In logs I see during boot Nov 22

Re: [ClusterLabs] questions about startup fencing

2017-11-30 Thread Gao,Yan
On 11/30/2017 09:14 AM, Andrei Borzenkov wrote: On Wed, Nov 29, 2017 at 6:54 PM, Ken Gaillot wrote: The same scenario is why a single node can't have quorum at start-up in a cluster with "two_node" set. Both nodes have to see each other at least once before they can

Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Gao,Yan
On 11/29/2017 04:54 PM, Ken Gaillot wrote: On Wed, 2017-11-29 at 14:22 +, Adam Spiers wrote: The same questions apply if this troublesome node was actually a remote node running pacemaker_remoted, rather than the 5th node in the cluster. Remote nodes don't join at the crmd level as