Re: [ClusterLabs] corosync dead loop in segfault handler

2017-03-14 Thread Christine Caulfield
On 11/03/17 01:32, cys wrote: > At 2017-03-09 18:25:59, "Christine Caulfield" wrote: >> Thanks. Oddly that looks like a totally different incident to the core >> file we had last time. That seemed to be in a node state transition >> whereas this is in stable running. The last thing to happen was a

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-03-10 Thread cys
At 2017-03-09 18:25:59, "Christine Caulfield" wrote: >Thanks. Oddly that looks like a totally different incident to the core >file we had last time. That seemed to be in a node state transition >whereas this is in stable running. The last thing to happen was an IPC >connection which indicates that

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-03-09 Thread Christine Caulfield
On 08/03/17 11:04, cys wrote: > At 2017-02-21 00:24:33, "Christine Caulfield" wrote: >> Thanks, I can read that core now. It's something odd happening in the >> sync() code that I can't quite diagnose without the blackbox. We've only >> ever seen crashes like that when there's been network corrupt

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-02-27 Thread cys
At 2017-02-21 00:24:33, "Christine Caulfield" wrote:>Thanks, I can read that core now. It's something odd happening in the >sync() code that I can't quite diagnose without the blackbox. We've only >ever seen crashes like that when there's been network corruption or >on-wire incompatibilities. Has

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-02-20 Thread Christine Caulfield
On 16/02/17 12:18, cys wrote: > If you need other packages, let me know. > Thanks, I can read that core now. It's something odd happening in the sync() code that I can't quite diagnose without the blackbox. We've only ever seen crashes like that when there's been network corruption or on-wire i

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-02-16 Thread Christine Caulfield
On 16/02/17 09:31, cys wrote: > The attachment includes coredump and logs just before corosync went wrong. > > The packages we use: > corosync-2.3.4-7.el7_2.1.x86_64 > corosynclib-2.3.4-7.el7_2.1.x86_64 > libqb-0.17.1-2.el7.1.x86_64 > > But they are not available any more at mirror.centos.org. If

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-02-16 Thread Christine Caulfield
On 16/02/17 03:51, cys wrote: > At 2017-02-15 23:13:08, "Christine Caulfield" wrote: >> >> Yes, it seems that some corosync SEGVs trigger this obscure bug in >> libqb. I've chased a few possible causes and none have been fruitful. >> >> If you get this then corosync has crashed, and this other bug

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-02-15 Thread cys
At 2017-02-15 23:13:08, "Christine Caulfield" wrote: > >Yes, it seems that some corosync SEGVs trigger this obscure bug in >libqb. I've chased a few possible causes and none have been fruitful. > >If you get this then corosync has crashed, and this other bug is masking >the actual diagnostics - I

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-02-15 Thread Jan Pokorný
On 15/02/17 18:04 +0100, Jan Pokorný wrote: > On 15/02/17 15:13 +, Christine Caulfield wrote: >> On 15/02/17 14:50, Jan Friesse wrote: Hi all, Corosync Cluster Engine, version '2.3.4' Copyright (c) 2006-2009 Red Hat, Inc. Today I found corosync consuming 100% cpu

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-02-15 Thread Jan Pokorný
On 15/02/17 15:13 +, Christine Caulfield wrote: > On 15/02/17 14:50, Jan Friesse wrote: >>> Hi all, >>> >>> Corosync Cluster Engine, version '2.3.4' >>> Copyright (c) 2006-2009 Red Hat, Inc. >>> >>> Today I found corosync consuming 100% cpu. Strace showed following: >>> >>> write(7, "\v\0\0\

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-02-15 Thread Christine Caulfield
On 15/02/17 14:50, Jan Friesse wrote: >> Hi all, >> >> Corosync Cluster Engine, version '2.3.4' >> Copyright (c) 2006-2009 Red Hat, Inc. >> >> Today I found corosync consuming 100% cpu. Strace showed following: >> >> write(7, "\v\0\0\0", 4) = -1 EAGAIN (Resource >> temporarily unava

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-02-15 Thread Jan Friesse
Hi all, Corosync Cluster Engine, version '2.3.4' Copyright (c) 2006-2009 Red Hat, Inc. Today I found corosync consuming 100% cpu. Strace showed following: write(7, "\v\0\0\0", 4) = -1 EAGAIN (Resource temporarily unavailable) write(7, "\v\0\0\0", 4) = -1 EAGAIN

[ClusterLabs] corosync dead loop in segfault handler

2017-02-15 Thread cys
Hi all, Corosync Cluster Engine, version '2.3.4' Copyright (c) 2006-2009 Red Hat, Inc. Today I found corosync consuming 100% cpu. Strace showed following: write(7, "\v\0\0\0", 4)                 = -1 EAGAIN (Resource temporarily unavailable) write(7, "\v\0\0\0", 4)                 = -1 EAGAIN (