[Linux-HA] Antw: MD sets failing under heavy load in a DRBD/Pacemaker Cluster

2011-10-04 Thread Ulrich Windl
Hi! From what I read I'd check you are running the latest BIOS, MPT Firmware and maybe also drive firmware. Maybe also check for proper cooling of the hardware, especially disks. The roout cause seems hardware-related to me. Regards, Ulrich >>> Caspar Smit schrieb am 04.10.2011 um 14:01 in Nach

Re: [Linux-HA] MD sets failing under heavy load in a DRBD/Pacemaker Cluster

2011-10-04 Thread Dimitri Maziuk
On 10/04/2011 07:01 AM, Caspar Smit wrote: > Hi all, > > We are having a major problem with one of our clusters. ... FWIW I see recent kernels dropping bits under high load often enough - and on supermicro hardware, too. Does your (controller) bios have any ahci-related settings and if so did yo

Re: [Linux-HA] MD sets failing under heavy load in a DRBD/Pacemaker Cluster

2011-10-04 Thread Thakkar, Vishal
Hi Caspar, What is the version of the FW on the LSI 3081E-R HBA? This is printed as a debug message when the driver loads. If the version is old, you may want to try updating it from http://www.lsi.com/channel/products/storagecomponents/Pages/HBAs.aspx On your question - "Is there a known iss

Re: [Linux-HA] MD sets failing under heavy load in a DRBD/Pacemaker Cluster

2011-10-04 Thread James Bottomley
On Tue, 2011-10-04 at 14:01 +0200, Caspar Smit wrote: > Oct 2 11:01:59 node03 kernel: [7370143.421999] mptbase: ioc0: > LogInfo(0x31110b00): Originator={PL}, Code={Reset}, SubCode(0x0b00) > cb_idx mptbase_reply > Oct 2 11:01:59 node03 kernel: [7370143.435220] mptbase: ioc0: > LogInfo(0x31181000):

Re: [Linux-HA] MD sets failing under heavy load in a DRBD/Pacemaker Cluster

2011-10-04 Thread CoolCold
>From my point of view it looks like driver/hardware errors, since you have records like: Oct 2 11:01:59 node03 kernel: [7370143.442783] end_request: I/O error, dev sdf, sector 3907028992 On Tue, Oct 4, 2011 at 4:01 PM, Caspar Smit wrote: > Hi all, > > We are having a major problem with one of

[Linux-HA] Some trouble with IPaddr2 resource.

2011-10-04 Thread Алексей Кашин
Hello. I seeing some error in crm status output: # crm status Last updated: Tue Oct 4 15:55:13 2011 Stack: openais Current DC: radius1 - partition with quorum Version: 1.0.7-54d7869bfe3691eb723b1d47810e5585d8246b58 2 Nodes configured, 2 expected votes 1 Resources configured. ===

Re: [Linux-HA] Two different Pacemaker/CoroSync on the same network

2011-10-04 Thread Lorenzo Milesi
> Yes, they can coexist as long as they use unique network sockets. You > control this by pairing up servers that are supposed to be part of > the > same cluster by specifying the mcastaddr and mcastport. Usually you > just alter mcastaddr. Great, thank you very much! _

Re: [Linux-HA] Two different Pacemaker/CoroSync on the same network

2011-10-04 Thread Dan Frincu
Hi, On Tue, Oct 4, 2011 at 6:12 PM, Lorenzo Milesi wrote: > Hi. > > Simple question: can two separate CoroSync installation coexsist on the same > network? Yes, they can coexist as long as they use unique network sockets. You control this by pairing up servers that are supposed to be part of th

[Linux-HA] Two different Pacemaker/CoroSync on the same network

2011-10-04 Thread Lorenzo Milesi
Hi. Simple question: can two separate CoroSync installation coexsist on the same network? Given that the corosync keys are different for the two, I suppose it is, but I'd like to know it for sure. Thanks ___ Linux-HA mailing list Linux-HA@lists.linux