Re: [Openais] FAILED TO RECEIVE followed by cluster failure

2011-07-21 Thread Steven Dake
On 07/21/2011 12:19 PM, Jed Smith wrote: > Steve, > > Thank you again for all of the information. > > I labbed an in-place upgrade and the Corosync 1.4.0 compile brought > down the 1.2.1-4ubuntu1 box. All I did was deploy from scratch, create > a cluster with 1.2.1-4ubuntu1 and Pacemaker 1.0.10-4

Re: [Openais] FAILED TO RECEIVE followed by cluster failure

2011-07-21 Thread Jed Smith
Steve, Thank you again for all of the information. I labbed an in-place upgrade and the Corosync 1.4.0 compile brought down the 1.2.1-4ubuntu1 box. All I did was deploy from scratch, create a cluster with 1.2.1-4ubuntu1 and Pacemaker 1.0.10-4ubuntu3, then compiled Corosync 1.4.0 and Pacemaker 1.0

Re: [Openais] FAILED TO RECEIVE followed by cluster failure

2011-07-18 Thread Steven Dake
On 07/18/2011 07:55 PM, Keisuke MORI wrote: > Hi, > > 2011/7/19 Steven Dake : >> On 07/18/2011 10:38 AM, Jed Smith wrote: >>> Thank you for your reply. >>> >>> On Mon, Jul 18, 2011 at 1:18 PM, Digimer wrote: Is it possible that the switch dropped the multicast group, and didn't reform i

Re: [Openais] FAILED TO RECEIVE followed by cluster failure

2011-07-18 Thread Keisuke MORI
Hi, 2011/7/19 Steven Dake : > On 07/18/2011 10:38 AM, Jed Smith wrote: >> Thank you for your reply. >> >> On Mon, Jul 18, 2011 at 1:18 PM, Digimer wrote: >>> Is it possible that the switch dropped the multicast group, and didn't >>> reform it fast enough to prevent the cluster from partitioning?

Re: [Openais] FAILED TO RECEIVE followed by cluster failure

2011-07-18 Thread Steven Dake
On 07/18/2011 04:21 PM, Jed Smith wrote: > Steven, > > Thank you very much for the reply and information. > > On Mon, Jul 18, 2011 at 6:58 PM, Steven Dake wrote: >> My recommendation to you is to update to a 1.3 or 1.4 series. Both of >> these have very tight maintenance rules around what goes

Re: [Openais] FAILED TO RECEIVE followed by cluster failure

2011-07-18 Thread Jed Smith
Steven, Thank you very much for the reply and information. On Mon, Jul 18, 2011 at 6:58 PM, Steven Dake wrote: > My recommendation to you is to update to a 1.3 or 1.4 series.   Both of > these have very tight maintenance rules around what goes in (ie: its not > tip development work). I will ind

Re: [Openais] FAILED TO RECEIVE followed by cluster failure

2011-07-18 Thread Steven Dake
On 07/18/2011 10:38 AM, Jed Smith wrote: > Thank you for your reply. > > On Mon, Jul 18, 2011 at 1:18 PM, Digimer wrote: >> Is it possible that the switch dropped the multicast group, and didn't >> reform it fast enough to prevent the cluster from partitioning? > > Our network guy says that the

Re: [Openais] FAILED TO RECEIVE followed by cluster failure

2011-07-18 Thread Jed Smith
Thank you for your reply. On Mon, Jul 18, 2011 at 1:18 PM, Digimer wrote: > Is it possible that the switch dropped the multicast group, and didn't > reform it fast enough to prevent the cluster from partitioning? Our network guy says that the switches do not look at multicast traffic, they merel

Re: [Openais] FAILED TO RECEIVE followed by cluster failure

2011-07-18 Thread Digimer
On 07/18/2011 12:17 PM, Jed Smith wrote: > Good morning, > > I am not subscribed to the list (yet, waiting on confirmation) so > please CC me on all replies. > > My employer has several deployments of Pacemaker on top of Corosync > and we have recently been hitting this: > > Jul 18 12:01:05

Re: [Openais] FAILED TO RECEIVE followed by cluster failure

2011-07-18 Thread Jed Smith
I apologize for omitting this information: * Corosync 1.2.1-4ubuntu1 * Pacemaker 1.0.10-4ubuntu3 Obviously, stock straight from the repos. -- Jed Smith j...@jedsmith.org ___ Openais mailing list Openais@lists.linux-foundation.org https://lists.linux

[Openais] FAILED TO RECEIVE followed by cluster failure

2011-07-18 Thread Jed Smith
Good morning, I am not subscribed to the list (yet, waiting on confirmation) so please CC me on all replies. My employer has several deployments of Pacemaker on top of Corosync and we have recently been hitting this: Jul 18 12:01:05 corosync[6065]: [TOTEM ] FAILED TO RECEIVE Jul 18 12:01: