HGST SAS link loss (was: mpt(4) timeout recovery improvements)

2014-01-15 Thread Edgar Fuß
> I still don't know what the root cause for the ,,Phy x: Link Status > Unknown'' events is Just for the record: The root cause is probably a ``self initiated reset'' of the Hitachi (HGST) disk due to a bug in the A100 firmware (HGST calls that ``microcode''). Fortunately, the update process is do

Re: mpt(4) timeout recovery improvements

2013-12-13 Thread Edgar Fuß
> To test the impact on other concurrent SCSI commands It looks like the patch cures my original, real, problem, too. It probably saved my day (or those of the 250 people with their home dirs on my file server) or my entire weekend. Without it, I would have lost one RAID component at 02:05 and a

Re: mpt(4) timeout recovery improvements

2013-11-27 Thread Brian Buhrow
v 27, 5:49pm, Edgar =?iso-8859-1?B?RnXf?= wrote: } Subject: Re: mpt(4) timeout recovery improvements } > Could somebody please review it, especially whether I got the locking } > etc. right? } I'm still eager for someone with an understanding of NetBSD device drivers } to comment o

Re: mpt(4) timeout recovery improvements

2013-11-27 Thread Edgar Fuß
> Could somebody please review it, especially whether I got the locking > etc. right? I'm still eager for someone with an understanding of NetBSD device drivers to comment on what I've done. >From my experiments, I know that the patch does ~TRT for my MPT hardware, but I'm pretty afraid I could

Re: mpt(4) timeout recovery improvements

2013-11-25 Thread Edgar Fuß
> On the whole, I like this patch Thanks. > 1. The original patch I created tries to recover from a number of > conditions other than simple timeouts based on messages that come back from > the IOC. In such cases, mpt_restart() is called. I think those events can > still be handled gracefully b

Re: mpt(4) timeout recovery improvements

2013-11-25 Thread Brian Buhrow
Hello Edgar. On the whole, I like this patch and I'll give it a whirl. Some comments: 1. The original patch I created tries to recover from a number of conditions other than simple timeouts based on messages that come back from the IOC. In such cases, mpt_restart() is called. I think

Re: mpt(4) timeout recovery improvements

2013-11-25 Thread Edgar Fuß
OK, I have something working now (based on Brian Buhrow's patch). Could somebody please review it, especially whether I got the locking etc. right? My basic test is to run dd if=/dev/rsd2d of=/dev/null bs=1m (where sd2 is an unsued disc), then pull out and re-insert the disc. With a stoc

Re: mpt(4) timeout recovery improvements

2013-11-24 Thread Manuel Bouyer
On Sun, Nov 24, 2013 at 12:41:12PM +0100, Edgar Fuß wrote: > So, to partially answer myself: > > > scsipi/driver interaction (is there documentation on this?) > I had missed the comprehensive scsipi(9) man page. I guess I mis-typed > "scsipi" when looking for it. > > > First question: what's the

Re: mpt(4) timeout recovery improvements

2013-11-24 Thread Edgar Fuß
So, to partially answer myself: > scsipi/driver interaction (is there documentation on this?) I had missed the comprehensive scsipi(9) man page. I guess I mis-typed "scsipi" when looking for it. > First question: what's the appropriate xs->error? So this has been answered both by me finding that

Re: mpt(4) timeout recovery improvements

2013-11-24 Thread Edgar Fuß
BB> I know we've been corresponding on this for a while Yes, thaks again for that. EF> Now, what's the correct way of reset/init the IOC and returning everything EF> to scsipi? I guess the correct order is to reset (which leave the IOC in EF> the stopped state), then to set xs->error and call sc

Re: mpt(4) timeout recovery improvements

2013-11-23 Thread Brian Buhrow
hello Edgar. I know we've been corresponding on this for a while and you've been trying my patches and making your own, but I know the answers to some of the questions you asked since I had to answer them when I wrote the original patches. See below for What I know in the context of you

mpt(4) timeout recovery improvements

2013-11-23 Thread Edgar Fuß
A while ago I asked > Since my mpt(4) controller looses one of its attached discs every few weeks, > needing a reboot and a twenty-hour RAID reconstruction, I'm thinking about > switching to some mpii(4)-based SAS controller. > > Does someone use mpii(4) in production? Is this ready to put 250