Re: [ClusterLabs] Antw: Re: Why Do All The Services Go Down When Just One Fails?

2019-02-20 Thread Eric Robinson
> -Original Message- > From: Users On Behalf Of Andrei > Borzenkov > Sent: Wednesday, February 20, 2019 8:51 PM > To: users@clusterlabs.org > Subject: Re: [ClusterLabs] Antw: Re: Why Do All The Services Go Down When > Just One Fails? > > 20.02.2019 21:51, Eric Robinson пишет: > > > >

Re: [ClusterLabs] Antw: Re: Why Do All The Services Go Down When Just One Fails?

2019-02-20 Thread Andrei Borzenkov
20.02.2019 21:51, Eric Robinson пишет: > > The following should show OK in a fixed font like Consolas, but the following > setup is supposed to be possible, and is even referenced in the ClusterLabs > documentation. > > > > > > +--+ > > | mysql001 +--+ > > +--+

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-20 Thread Jan Pokorný
On 20/02/19 21:16 +0100, Klaus Wenninger wrote: > On 02/20/2019 08:51 PM, Jan Pokorný wrote: >> On 20/02/19 17:37 +, Edwin Török wrote: >>> strace for the situation described below (corosync 95%, 1 vCPU): >>> https://clbin.com/hZL5z >> I might have missed that earlier or this may be just some s

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-20 Thread Jan Pokorný
On 20/02/19 21:25 +0100, Klaus Wenninger wrote: > Hmm maybe the thing that should be scheduled is running at > SCHED_RR as well but with just a lower prio. So it wouldn't > profit from the sched_yield and it wouldn't get anything of > the 5% either. Actually, it would possibly make the situation e

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-20 Thread Klaus Wenninger
On 02/20/2019 06:37 PM, Edwin Török wrote: > On 20/02/2019 13:08, Jan Friesse wrote: >> Edwin Török napsal(a): >>> On 20/02/2019 07:57, Jan Friesse wrote: Edwin, > > On 19/02/2019 17:02, Klaus Wenninger wrote: >> On 02/19/2019 05:41 PM, Edwin Török wrote: >>> On 19/02/2019 16:2

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-20 Thread Klaus Wenninger
On 02/20/2019 08:51 PM, Jan Pokorný wrote: > On 20/02/19 17:37 +, Edwin Török wrote: >> strace for the situation described below (corosync 95%, 1 vCPU): >> https://clbin.com/hZL5z > I might have missed that earlier or this may be just some sort > of insignificant/misleading clue: > >> strace: P

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-20 Thread Jan Pokorný
On 20/02/19 17:37 +, Edwin Török wrote: > strace for the situation described below (corosync 95%, 1 vCPU): > https://clbin.com/hZL5z I might have missed that earlier or this may be just some sort of insignificant/misleading clue: > strace: Process 4923 attached with 2 threads > strace: [ Proc

Re: [ClusterLabs] Antw: Re: Why Do All The Services Go Down When Just One Fails?

2019-02-20 Thread Eric Robinson
> -Original Message- > From: Users On Behalf Of Ulrich Windl > Sent: Tuesday, February 19, 2019 11:35 PM > To: users@clusterlabs.org > Subject: [ClusterLabs] Antw: Re: Why Do All The Services Go Down When > Just One Fails? > > >>> Eric Robinson mailto:eric.robin...@psmnv.com

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-20 Thread Andrei Borzenkov
18.02.2019 18:53, Ken Gaillot пишет: > On Sun, 2019-02-17 at 20:33 +0300, Andrei Borzenkov wrote: >> 17.02.2019 0:33, Andrei Borzenkov пишет: >>> 17.02.2019 0:03, Eric Robinson пишет: Here are the relevant corosync logs. It appears that the stop action for resource p_mysql_002 failed

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-20 Thread Edwin Török
On 20/02/2019 13:08, Jan Friesse wrote: > Edwin Török napsal(a): >> On 20/02/2019 07:57, Jan Friesse wrote: >>> Edwin, On 19/02/2019 17:02, Klaus Wenninger wrote: > On 02/19/2019 05:41 PM, Edwin Török wrote: >> On 19/02/2019 16:26, Edwin Török wrote: >>> On 18/02/2019 18:

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-20 Thread Ken Gaillot
On Wed, 2019-02-20 at 14:03 +, Edwin Török wrote: > > On 20/02/2019 12:44, Jan Pokorný wrote: > > On 19/02/19 16:41 +, Edwin Török wrote: > > > Also noticed this: [ 5390.361861] crmd[12620]: segfault at 0 ip > > > 7f221c5e03b1 sp 7ffcf9cf9d88 error 4 in > > > libc-2.17.so[7f221c554

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-20 Thread Edwin Török
On 20/02/2019 12:44, Jan Pokorný wrote: > On 19/02/19 16:41 +, Edwin Török wrote: >> Also noticed this: [ 5390.361861] crmd[12620]: segfault at 0 ip >> 7f221c5e03b1 sp 7ffcf9cf9d88 error 4 in >> libc-2.17.so[7f221c554000+1c2000] [ 5390.361918] Code: b8 00 00 >> 00 04 00 00 00 74 07 4

[ClusterLabs] Antw: Re: corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-20 Thread Ulrich Windl
>>> Edwin Török schrieb am 20.02.2019 um 12:30 in Nachricht <0a49f593-1543-76e4-a8ab-06a48c596...@citrix.com>: > On 20/02/2019 07:57, Jan Friesse wrote: >> Edwin, >>> >>> >>> On 19/02/2019 17:02, Klaus Wenninger wrote: On 02/19/2019 05:41 PM, Edwin Török wrote: > On 19/02/2019 16:26, Edwi

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-20 Thread Jan Friesse
Edwin Török napsal(a): On 20/02/2019 07:57, Jan Friesse wrote: Edwin, On 19/02/2019 17:02, Klaus Wenninger wrote: On 02/19/2019 05:41 PM, Edwin Török wrote: On 19/02/2019 16:26, Edwin Török wrote: On 18/02/2019 18:27, Edwin Török wrote: Did a test today with CentOS 7.6 with upstream kerne

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-20 Thread Jan Pokorný
On 19/02/19 16:41 +, Edwin Török wrote: > Also noticed this: > [ 5390.361861] crmd[12620]: segfault at 0 ip 7f221c5e03b1 sp > 7ffcf9cf9d88 error 4 in libc-2.17.so[7f221c554000+1c2000] > [ 5390.361918] Code: b8 00 00 00 04 00 00 00 74 07 48 8d 05 f8 f2 0d 00 > c3 0f 1f 80 00 00 00 00 48

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-20 Thread Edwin Török
On 20/02/2019 07:57, Jan Friesse wrote: > Edwin, >> >> >> On 19/02/2019 17:02, Klaus Wenninger wrote: >>> On 02/19/2019 05:41 PM, Edwin Török wrote: On 19/02/2019 16:26, Edwin Török wrote: > On 18/02/2019 18:27, Edwin Török wrote: >> Did a test today with CentOS 7.6 with upstream kerne

Re: [ClusterLabs] Antw: Re: corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-20 Thread Klaus Wenninger
On 02/20/2019 08:07 AM, Ulrich Windl wrote: Klaus Wenninger schrieb am 19.02.2019 um 18:02 in > Nachricht <7b626ca1-4f59-6257-bfb5-ef5d0d823...@redhat.com>: > [...] >>> It is looping on: >>> debug Feb 19 16:37:24 mcast_sendmsg(408):12: sendmsg(mcast) failed >>> (non-critical): Resource temp