[ClusterLabs] Antw: [EXT] Re: Maximum cluster size with Pacemaker 2.x and Corosync 3.x, and scaling to hundreds of nodes

2020-07-30 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 30.07.2020 um 16:43 in Nachricht <93b973947008b62c4848f8a799ddc3f0949451e8.ca...@redhat.com>: > On Wed, 2020‑07‑29 at 23:12 +, Toby Haynes wrote: >> In Corosync 1.x there was a limit on the maximum number of active >> nodes in a corosync cluster ‑ broswing the mailin

Re: [ClusterLabs] Antw: [EXT] why is node fenced ?

2020-07-30 Thread Andrei Borzenkov
30.07.2020 23:23, Lentes, Bernd пишет: > > > - Am 30. Jul 2020 um 9:28 schrieb Ulrich Windl > ulrich.wi...@rz.uni-regensburg.de: > > "Lentes, Bernd" schrieb am 29.07.2020 >> um >> 17:26 in Nachricht >> <1894379294.27456141.1596036406000.javamail.zim...@helmholtz-muenchen.de>: >>> Hi, >

Re: [ClusterLabs] Antw: [EXT] why is node fenced ?

2020-07-30 Thread Lentes, Bernd
- Am 30. Jul 2020 um 9:28 schrieb Ulrich Windl ulrich.wi...@rz.uni-regensburg.de: "Lentes, Bernd" schrieb am 29.07.2020 > um > 17:26 in Nachricht > <1894379294.27456141.1596036406000.javamail.zim...@helmholtz-muenchen.de>: >> Hi, >> >> a few days ago one of my nodes was fenced and i

Re: [ClusterLabs] Antw: [EXT] Stonith failing

2020-07-30 Thread Strahil Nikolov
SBD can use iSCSI (for example target is also the quorum node), disk partition or lvm LV, so I guess it can also use a ZFS volume dedicated for the SBD (10MB is enough). In your case IPMI is quite suitable. About the power fencing when persistent reservations are removed -

Re: [ClusterLabs] Maximum cluster size with Pacemaker 2.x and Corosync 3.x, and scaling to hundreds of nodes

2020-07-30 Thread Ken Gaillot
On Wed, 2020-07-29 at 23:12 +, Toby Haynes wrote: > In Corosync 1.x there was a limit on the maximum number of active > nodes in a corosync cluster - broswing the mailing list says 64 > hosts. The Pacemaker 1.1 documentation says scalability goes up to 16 > nodes. The Pacemaker 2.0 documentatio

Re: [ClusterLabs] Antw: [EXT] Stonith failing

2020-07-30 Thread Strahil Nikolov
This one links to how to power fence when reservations are removed: https://access.redhat.com/solutions/4526731 Best Regards, Strahil Nikolov На 30 юли 2020 г. 9:28:51 GMT+03:00, Andrei Borzenkov написа: >30.07.2020 08:42, Strahil Nikolov пишет: >> You got plenty of options: >> - IPMI based

Re: [ClusterLabs] Pacemaker crashed and produce a coredump file

2020-07-30 Thread Strahil Nikolov
Early systemd bugs caused dbus issues and session files not being cleaned up properly. At least EL 7.4 or older were affected. What is your OS and version? P.S.: I know your pain. I am still fighting to explain that without planned downtime, the end users will definitely get unplanned

Re: [ClusterLabs] Antw: [EXT] Stonith failing

2020-07-30 Thread Strahil Nikolov
You got plenty of options: - IPMI based fencing like HP iLO, DELL iDRAC - SCSI-3 persistent reservations (which can be extended to fence the node when the reservation(s) were removed) - Shared disk (even iSCSI) and using SBD (a.k.a. Poison pill) -> in case your hardware has no watchd

Re: [ClusterLabs] Antw: [EXT] Stonith failing

2020-07-30 Thread Gabriele Bulfon
Reading sbd from SuSE I saw that it requires a special block to write informations, I don't think this is possibile here.   It's a dual node ZFS storage running our own XStreamOS/illumos distribution, and here we're trying to add HA capabilities. We can move IPs, ZFS Pools and COMSTAR/iSCSI/FC, a

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Stonith failing

2020-07-30 Thread Reid Wahl
That appears to support IPMI, so fence_ipmilan is likely an option. Further, it probably has a watchdog device. If so, then sbd is an option. On Thu, Jul 30, 2020 at 2:00 AM Gabriele Bulfon wrote: > It is this system: > > https://www.supermicro.com/products/system/1u/1029/SYS-1029TP-DC0R.cfm > >

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Stonith failing

2020-07-30 Thread Gabriele Bulfon
It is this system: https://www.supermicro.com/products/system/1u/1029/SYS-1029TP-DC0R.cfm   it has a sas3 backplane with hotswap sas disks that are visible to both nodes at the same time.   Gabriele      Sonicle S.r.l.  :  http://www.sonicle.com Music:  http://www.gabrielebulfon.com Quantum Mechan

Re: [ClusterLabs] Antw: [EXT] Stonith failing

2020-07-30 Thread Andrei Borzenkov
On Thu, Jul 30, 2020 at 11:29 AM Strahil Nikolov wrote: > > This one links to how to power fence when reservations are removed: > https://access.redhat.com/solutions/4526731 > All of this is RH(CS) specific ___ Manage your subscription: https://lists.

[ClusterLabs] Antw: [EXT] Re: Pacemaker crashed and produce a coredump file

2020-07-30 Thread Ulrich Windl
>>> Strahil Nikolov schrieb am 30.07.2020 um 10:23 in Nachricht : > Early systemd bugs caused dbus issues and session files not being cleaned > up properly. At least EL 7.4 or older were affected. > > What is your OS and version? > > P.S.: I know your pain. I am still fighting to expl

Re: [ClusterLabs] Pacemaker crashed and produce a coredump file

2020-07-30 Thread Klaus Wenninger
On 7/29/20 10:39 AM, Reid Wahl wrote: > Hi, > > It looks like this is a bug that was fixed in later releases. The > `path` variable was a null pointer when it was passed to > `systemd_unit_exec_with_unit` as the `unit` argument. Commit 62a0d26a >

[ClusterLabs] Antw: [EXT] why is node fenced ?

2020-07-30 Thread Ulrich Windl
>>> "Lentes, Bernd" schrieb am 29.07.2020 um 17:26 in Nachricht <1894379294.27456141.1596036406000.javamail.zim...@helmholtz-muenchen.de>: > Hi, > > a few days ago one of my nodes was fenced and i don't know why, which is > something i really don't like. > What i did: > I put one node (ha-idg-1)