Re: [Lustre-discuss] lustre 1.6.5.1 panic on failover

2008-08-01 Thread Andreas Dilger
On Aug 01, 2008 11:39 -0400, Brock Palen wrote: > That will work right, the machine cycles the second takes over and > all is well. > > If instead of crashing the node I run 'killall -9 heartbeat' > I can get the panic every time. I even edited the external/ipmi > script from 'power reset' t

Re: [Lustre-discuss] lustre 1.6.5.1 panic on failover

2008-08-01 Thread Brian J. Murrell
On Fri, 2008-08-01 at 11:39 -0400, Brock Palen wrote: > > I am looking at grabbing a crash dump. I think its a race, heartbeat > is mounting the filesystems before the first node is toatally dead. Just to be clear, heartbeat should _always_ STONITH the peer node before doing any mount of a Lust

Re: [Lustre-discuss] lustre 1.6.5.1 panic on failover

2008-08-01 Thread Brock Palen
yes it is consistant. I looked up how to induce a panic using sysrq echo c > /proc/sysreq-trigger That will work right, the machine cycles the second takes over and all is well. If instead of crashing the node I run 'killall -9 heartbeat' I can get the panic every time. I even edited the ext

Re: [Lustre-discuss] lustre 1.6.5.1 panic on failover

2008-07-31 Thread Klaus Steden
netdump is indeed good for this, but you may have to take two or three cracks at it ... it doesn't always dump the complete core image, and you can't really do a whole lot with the incomplete version. Klaus On 7/31/08 5:50 PM, "Kilian CAVALOTTI" <[EMAIL PROTECTED]>did etch on stone tablets: > O

Re: [Lustre-discuss] lustre 1.6.5.1 panic on failover

2008-07-31 Thread Kilian CAVALOTTI
On Thursday 31 July 2008 17:22:28 Brock Palen wrote: > Whats a good tool to grab this? Its more than one page long, and the > machine does not have serial ports. If your servers do IPMI, you probably can configure Serial-over-LAN to get a console and capture the logs. But a way more convenient s

Re: [Lustre-discuss] lustre 1.6.5.1 panic on failover

2008-07-31 Thread Brock Palen
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Whats a good tool to grab this? Its more than one page long, and the machine does not have serial ports. Links are ok. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Jul 31, 2008, at 5:14 PM, Br

Re: [Lustre-discuss] lustre 1.6.5.1 panic on failover

2008-07-31 Thread Klaus Steden
Hi Brock, I've been using Sun X2200s with Lustre in a similar configuration (IPMI, STONITH, Linux-HA, FC storage) and haven't had any issues like this (although I would typically panic the primary node during testing using Sysrq) ... is the behaviour consistent? Klaus On 7/31/08 1:57 PM, "Brock

Re: [Lustre-discuss] lustre 1.6.5.1 panic on failover

2008-07-31 Thread Brian J. Murrell
On Thu, 2008-07-31 at 16:57 -0400, Brock Palen wrote: > > Problem is when I run a test on the host that currently has the mds/ > mgs mounted 'killall -9 heartbeat' I see the IPMI shutdown and when > the second 4100 tries to mount the filesystem it does a kernel panic. We'd need to see the *f

[Lustre-discuss] lustre 1.6.5.1 panic on failover

2008-07-31 Thread Brock Palen
I have two machines I am setting up as my first mds failover pair. The two sun x4100's are connected to a FC disk array. I have set up heartbeat with IPMI for STONITH. Problem is when I run a test on the host that currently has the mds/ mgs mounted 'killall -9 heartbeat' I see the IPMI shu