Re: [ClusterLabs] corosync dead loop in segfault handler

2017-03-14 Thread Christine Caulfield
On 11/03/17 01:32, cys wrote: > At 2017-03-09 18:25:59, "Christine Caulfield" wrote: >> Thanks. Oddly that looks like a totally different incident to the core >> file we had last time. That seemed to be in a node state transition >> whereas this is in stable running. The last

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-03-10 Thread cys
At 2017-03-09 18:25:59, "Christine Caulfield" wrote: >Thanks. Oddly that looks like a totally different incident to the core >file we had last time. That seemed to be in a node state transition >whereas this is in stable running. The last thing to happen was an IPC

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-03-09 Thread Christine Caulfield
On 08/03/17 11:04, cys wrote: > At 2017-02-21 00:24:33, "Christine Caulfield" wrote: >> Thanks, I can read that core now. It's something odd happening in the >> sync() code that I can't quite diagnose without the blackbox. We've only >> ever seen crashes like that when

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-02-27 Thread cys
At 2017-02-21 00:24:33, "Christine Caulfield" wrote:>Thanks, I can read that core now. It's something odd happening in the >sync() code that I can't quite diagnose without the blackbox. We've only >ever seen crashes like that when there's been network corruption or >on-wire

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-02-20 Thread Christine Caulfield
On 16/02/17 12:18, cys wrote: > If you need other packages, let me know. > Thanks, I can read that core now. It's something odd happening in the sync() code that I can't quite diagnose without the blackbox. We've only ever seen crashes like that when there's been network corruption or on-wire

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-02-16 Thread Christine Caulfield
On 16/02/17 09:31, cys wrote: > The attachment includes coredump and logs just before corosync went wrong. > > The packages we use: > corosync-2.3.4-7.el7_2.1.x86_64 > corosynclib-2.3.4-7.el7_2.1.x86_64 > libqb-0.17.1-2.el7.1.x86_64 > > But they are not available any more at mirror.centos.org.

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-02-16 Thread Christine Caulfield
On 16/02/17 03:51, cys wrote: > At 2017-02-15 23:13:08, "Christine Caulfield" wrote: >> >> Yes, it seems that some corosync SEGVs trigger this obscure bug in >> libqb. I've chased a few possible causes and none have been fruitful. >> >> If you get this then corosync has

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-02-15 Thread Jan Pokorný
On 15/02/17 18:04 +0100, Jan Pokorný wrote: > On 15/02/17 15:13 +, Christine Caulfield wrote: >> On 15/02/17 14:50, Jan Friesse wrote: Hi all, Corosync Cluster Engine, version '2.3.4' Copyright (c) 2006-2009 Red Hat, Inc. Today I found corosync consuming 100%

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-02-15 Thread Jan Pokorný
On 15/02/17 15:13 +, Christine Caulfield wrote: > On 15/02/17 14:50, Jan Friesse wrote: >>> Hi all, >>> >>> Corosync Cluster Engine, version '2.3.4' >>> Copyright (c) 2006-2009 Red Hat, Inc. >>> >>> Today I found corosync consuming 100% cpu. Strace showed following: >>> >>> write(7,

[ClusterLabs] corosync dead loop in segfault handler

2017-02-15 Thread cys
Hi all, Corosync Cluster Engine, version '2.3.4' Copyright (c) 2006-2009 Red Hat, Inc. Today I found corosync consuming 100% cpu. Strace showed following: write(7, "\v\0\0\0", 4)                 = -1 EAGAIN (Resource temporarily unavailable) write(7, "\v\0\0\0", 4)                 = -1 EAGAIN