Re: Software RAID when it works and when it doesn't

2007-11-02 Thread Alberto Alonso
On Sat, 2007-10-27 at 11:26 -0400, Bill Davidsen wrote: > Alberto Alonso wrote: > > On Fri, 2007-10-26 at 18:12 +0200, Goswin von Brederlow wrote: > > > > > >> Depending on the hardware you can still access a different disk while > >> another one is reseting. But since there is no timeout in md

Re: Software RAID when it works and when it doesn't

2007-10-27 Thread Bill Davidsen
Alberto Alonso wrote: On Fri, 2007-10-26 at 18:12 +0200, Goswin von Brederlow wrote: Depending on the hardware you can still access a different disk while another one is reseting. But since there is no timeout in md it won't try to use any other disk while one is stuck. That is exactly what

Re: Software RAID when it works and when it doesn't

2007-10-26 Thread Alberto Alonso
On Fri, 2007-10-26 at 18:12 +0200, Goswin von Brederlow wrote: > Depending on the hardware you can still access a different disk while > another one is reseting. But since there is no timeout in md it won't > try to use any other disk while one is stuck. > > That is exactly what I miss. > > MfG

Re: Software RAID when it works and when it doesn't

2007-10-26 Thread Goswin von Brederlow
Bill Davidsen <[EMAIL PROTECTED]> writes: > Alberto Alonso wrote: >> On Tue, 2007-10-23 at 18:45 -0400, Bill Davidsen wrote: >> >> >>> I'm not sure the timeouts are the problem, even if md did its own >>> timeout, it then needs a way to tell the driver (or device) to stop >>> retrying. I don't bel

Re: Software RAID when it works and when it doesn't

2007-10-26 Thread Justin Piszcz
On Fri, 26 Oct 2007, Goswin von Brederlow wrote: Justin Piszcz <[EMAIL PROTECTED]> writes: On Fri, 19 Oct 2007, Alberto Alonso wrote: On Thu, 2007-10-18 at 17:26 +0200, Goswin von Brederlow wrote: Mike Accetta <[EMAIL PROTECTED]> writes: What I would like to see is a timeout driven fal

Re: Software RAID when it works and when it doesn't

2007-10-26 Thread Goswin von Brederlow
Justin Piszcz <[EMAIL PROTECTED]> writes: > On Fri, 19 Oct 2007, Alberto Alonso wrote: > >> On Thu, 2007-10-18 at 17:26 +0200, Goswin von Brederlow wrote: >>> Mike Accetta <[EMAIL PROTECTED]> writes: >> >>> What I would like to see is a timeout driven fallback mechanism. If >>> one mirror does not

Re: Software RAID when it works and when it doesn't

2007-10-24 Thread Alberto Alonso
On Wed, 2007-10-24 at 16:04 -0400, Bill Davidsen wrote: > I think what you really want is to notice how long the drive and driver > took to recover or fail, and take action based on that. In general "kick > the drive" is not optimal for a few bad spots, even if the drive > recovery sucks. The

Re: Software RAID when it works and when it doesn't

2007-10-24 Thread Bill Davidsen
Alberto Alonso wrote: On Tue, 2007-10-23 at 18:45 -0400, Bill Davidsen wrote: I'm not sure the timeouts are the problem, even if md did its own timeout, it then needs a way to tell the driver (or device) to stop retrying. I don't believe that's available, certainly not everywhere, and anyt

Re: Software RAID when it works and when it doesn't

2007-10-23 Thread Alberto Alonso
On Tue, 2007-10-23 at 18:45 -0400, Bill Davidsen wrote: > I'm not sure the timeouts are the problem, even if md did its own > timeout, it then needs a way to tell the driver (or device) to stop > retrying. I don't believe that's available, certainly not everywhere, > and anything other than eve

Re: Software RAID when it works and when it doesn't

2007-10-23 Thread Bill Davidsen
Alberto Alonso wrote: On Thu, 2007-10-18 at 17:26 +0200, Goswin von Brederlow wrote: Mike Accetta <[EMAIL PROTECTED]> writes: What I would like to see is a timeout driven fallback mechanism. If one mirror does not return the requested data within a certain time (say 1 second) then

Re: Software RAID when it works and when it doesn't

2007-10-20 Thread Justin Piszcz
On Sat, 20 Oct 2007, Michael Tokarev wrote: There was an idea some years ago about having an additional layer on between a block device and whatever else is above it (filesystem or something else), that will just do bad block remapping. Maybe it was even implemented in LVM or IBM-proposed EVM

Re: Software RAID when it works and when it doesn't

2007-10-20 Thread Michael Tokarev
Justin Piszcz wrote: [] >> - >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to [EMAIL PROTECTED] >> More majordomo info at http://vger.kernel.org/majordomo-info.html Justin, forgive me please, but can you learn to trim the original messages wh

Re: Software RAID when it works and when it doesn't

2007-10-19 Thread Justin Piszcz
On Fri, 19 Oct 2007, Alberto Alonso wrote: On Thu, 2007-10-18 at 17:26 +0200, Goswin von Brederlow wrote: Mike Accetta <[EMAIL PROTECTED]> writes: What I would like to see is a timeout driven fallback mechanism. If one mirror does not return the requested data within a certain time (say 1

Re: Software RAID when it works and when it doesn't

2007-10-19 Thread Alberto Alonso
On Thu, 2007-10-18 at 17:26 +0200, Goswin von Brederlow wrote: > Mike Accetta <[EMAIL PROTECTED]> writes: > What I would like to see is a timeout driven fallback mechanism. If > one mirror does not return the requested data within a certain time > (say 1 second) then the request should be duplicat

Re: Software RAID when it works and when it doesn't

2007-10-18 Thread Goswin von Brederlow
Mike Accetta <[EMAIL PROTECTED]> writes: > Also, read errors don't tend to fail the array so when the bad disk is > again accessed for some subsequent read the whole hopeless retry process > begins anew. > > I posted a patch about 6 weeks ago which attempts to improve this situation > for RAID1 by

Re: Software RAID when it works and when it doesn't

2007-10-17 Thread Support
On Tue, 2007-10-16 at 17:57 -0400, Mike Accetta wrote: > Was the disk driver generating any low level errors or otherwise > indicating that it might be retrying operations on the bad drive at > the time (i.e. console diagnostics)? As Neil mentioned later, the md layer > is at the mercy of the low

Re: Software RAID when it works and when it doesn't

2007-10-16 Thread Richard Scobie
Mike Accetta wrote: is at the mercy of the low level disk driver. We've observed abysmal RAID1 recovery times on failing SATA disks because all the time is being spent in the driver retrying operations which will never succeed. Also, read errors don't tend to fail the array so when the bad disk

Re: Software RAID when it works and when it doesn't

2007-10-16 Thread Mike Accetta
Alberto Alonso writes: > On Sun, 2007-10-14 at 08:50 +1000, Neil Brown wrote: > > On Saturday October 13, [EMAIL PROTECTED] wrote: > > > Over the past several months I have encountered 3 > > > cases where the software RAID didn't work in keeping > > > the servers up and running. > > > > > > In al

Re: Software RAID when it works and when it doesn't

2007-10-14 Thread Alberto Alonso
On Sun, 2007-10-14 at 10:21 -0600, Maurice Hilarius wrote: > Alberto Alonso wrote: > > > PATA (IDE) with > Master and Slave drives is a "bad idea" as, when one drive fails, the > other of the Master & Slave pair often is no longer usable. > On discrete interfaces, with all drives configured as

Re: Software RAID when it works and when it doesn't

2007-10-13 Thread Alberto Alonso
On Sun, 2007-10-14 at 08:50 +1000, Neil Brown wrote: > On Saturday October 13, [EMAIL PROTECTED] wrote: > > Over the past several months I have encountered 3 > > cases where the software RAID didn't work in keeping > > the servers up and running. > > > > In all cases, the failure has been on a sin

Re: Software RAID when it works and when it doesn't

2007-10-13 Thread Neil Brown
On Saturday October 13, [EMAIL PROTECTED] wrote: > Over the past several months I have encountered 3 > cases where the software RAID didn't work in keeping > the servers up and running. > > In all cases, the failure has been on a single drive, > yet the whole md device and server become unresponsi

Re: Software RAID when it works and when it doesn't

2007-10-13 Thread Eyal Lebedinsky
RAID0 is non redundant so a disk failure will correctly fail the array. Alberto Alonso wrote: Over the past several months I have encountered 3 cases where the software RAID didn't work in keeping the servers up and running. In all cases, the failure has been on a single drive, yet the whole md

Software RAID when it works and when it doesn't

2007-10-13 Thread Alberto Alonso
Over the past several months I have encountered 3 cases where the software RAID didn't work in keeping the servers up and running. In all cases, the failure has been on a single drive, yet the whole md device and server become unresponsive. (usb-storage) In one situation a RAID 0 across 2 USB dri