[ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-14 Thread Edwin Török
Hello, We were testing corosync 2.4.3/libqb 1.0.1-6/sbd 1.3.1/gfs2 on 4.19 and noticed a fundamental problem with realtime priorities: - corosync runs on CPU3, and interrupts for the NIC used by corosync are also routed to CPU3 - corosync runs with SCHED_RR, ksoftirqd does not (should it?), but wi

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-14 Thread Jan Friesse
Edwin, Hello, We were testing corosync 2.4.3/libqb 1.0.1-6/sbd 1.3.1/gfs2 on 4.19 and noticed a fundamental problem with realtime priorities: - corosync runs on CPU3, and interrupts for the NIC used by corosync are also routed to CPU3 - corosync runs with SCHED_RR, ksoftirqd does not (should it

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-14 Thread Edvin Torok
Apologies for top posting, the strace you asked for is available here (although running strace itself had side-effect of getting corosync out of the live lock): https://clbin.com/9kOUM Best regards, --Edwin From: Jan Friesse Sent: 14 February 2019 18:34 T

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-14 Thread Jan Friesse
Edvin, Apologies for top posting, the strace you asked for is available here (although running strace itself had side-effect of getting corosync out of the live lock): Yep, I know, but this was after strace finished, right? So links should contain time when corosync was stucked, right? htt

[ClusterLabs] Antw: corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-14 Thread Ulrich Windl
Hi! IMHO any process running at real-time priorities must make sure that it consumes the CPU only for shorrt moment that are really critical to be performed in time. Specifically having some code that performs poorly (for various reasons) is absolutely _not_ a candidate to be run with real-time pr

[ClusterLabs] Antw: Re: corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-14 Thread Ulrich Windl
I would expect that, as strace interrupts the RT task to query the code; you should run strace at the same RT priority ;-) >>> Edvin Torok 14.02.19 19.54 Uhr >>> Apologies for top posting, the strace you asked for is available here (although running strace itself had side-effect of getting coro

Re: [ClusterLabs] Antw: corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-14 Thread Jan Friesse
Ulrich Windl napsal(a): Hi! IMHO any process running at real-time priorities must make sure that it consumes the CPU only for shorrt moment that are really critical to be performed in time. Specifically having some code that performs poorly (for various reasons) is absolutely _not_ a candidate t