Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Matt Mackall
On Fri, Feb 02, 2007 at 05:58:04PM -0500, Mark Lord wrote: > Matt Mackall wrote: > >.. > >Also worth considering is that spending minutes trying to reread > >damaged sectors is likely to accelerate your death spiral. More data > >may be recoverable if you give up quickly in a first pass, then go >

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Mark Lord
Matt Mackall wrote: .. Also worth considering is that spending minutes trying to reread damaged sectors is likely to accelerate your death spiral. More data may be recoverable if you give up quickly in a first pass, then go back and manually retry damaged bits with smaller I/Os. All good input.

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Douglas Gilbert
Alan wrote: >> The interesting point of this question is about the typically pattern of >> IO errors. On a read, it is safe to assume that you will have issues >> with some bounded numbers of adjacent sectors. > > Which in theory you can get by asking the drive for the real sector size > from th

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Matt Mackall
On Fri, Feb 02, 2007 at 11:06:19AM -0500, Mark Lord wrote: > Alan wrote: > > > >If this is the right strategy for disk recovery for a given type of > >device then this ought to be an automatic strategy. Most end users will > >not have the knowledge to frob about in sysfs, and if the bad sector hits

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Ric Wheeler
James Bottomley wrote: On Fri, 2007-02-02 at 14:42 +, Alan wrote: The interesting point of this question is about the typically pattern of IO errors. On a read, it is safe to assume that you will have issues with some bounded numbers of adjacent sectors. Which in theory you can

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Mark Lord
Alan wrote: If this is the right strategy for disk recovery for a given type of device then this ought to be an automatic strategy. Most end users will not have the knowledge to frob about in sysfs, and if the bad sector hits at the wrong moment a sensible automatic recovery strategy is going to

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread James Bottomley
On Fri, 2007-02-02 at 14:42 +, Alan wrote: > > The interesting point of this question is about the typically pattern of > > IO errors. On a read, it is safe to assume that you will have issues > > with some bounded numbers of adjacent sectors. > > Which in theory you can get by asking the dr

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Alan
> your system requirements are, what the system is trying to do (i.e., > when trying to recover a failing but not dead yet disk, IO errors should > be as quick as possible and we should choose an IO scheduler that does > not combine IO's). If this is the right strategy for disk recovery for a g

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Alan
> The interesting point of this question is about the typically pattern of > IO errors. On a read, it is safe to assume that you will have issues > with some bounded numbers of adjacent sectors. Which in theory you can get by asking the drive for the real sector size from the ATA7 info. (We ough

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Ric Wheeler
James Bottomley wrote: On Thu, 2007-02-01 at 15:02 -0500, Mark Lord wrote: I believe you made the first change in response to my prodding at the time, when libata was not returning valid sense data (no LBA) for media errors. The SCSI EH handling of that was rather poor at the time, and so

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-01 Thread Mark Lord
James Bottomley wrote: On Thu, 2007-02-01 at 15:02 -0500, Mark Lord wrote: .. One thing that could be even better than the patch below, would be to have it perhaps skip the entire bio that includes the failed sector, rather than only the bad sector itself. Er ... define "skip over the bio". A

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-01 Thread James Bottomley
On Thu, 2007-02-01 at 15:02 -0500, Mark Lord wrote: > I believe you made the first change in response to my prodding at the time, > when libata was not returning valid sense data (no LBA) for media errors. > The SCSI EH handling of that was rather poor at the time, > and so having it not retry the

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-01 Thread Mark Lord
James Bottomley wrote: On Tue, 2007-01-30 at 19:47 -0500, Mark Lord wrote: Kernels since about 2.6.16 or so have been broken in this regard. They "complete" the good sectors before the error, and then fail the entire remaining portions of the request. What was the commit that introduced the ch

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Mark Lord
James Bottomley wrote: On Wed, 2007-01-31 at 12:57 -0500, Mark Lord wrote: Alan wrote: When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable, as the drive itself has already done internal retries (libata uses the "with retry" ATA opcodes for this). This depends on the firmwa

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread James Bottomley
On Wed, 2007-01-31 at 12:57 -0500, Mark Lord wrote: > Alan wrote: > >> When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable, > >> as the drive itself has already done internal retries (libata uses the > >> "with retry" ATA opcodes for this). > > > > This depends on the firmware

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Mark Lord
Alan wrote: When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable, as the drive itself has already done internal retries (libata uses the "with retry" ATA opcodes for this). This depends on the firmware. Some of the "raid firmware" drives don't appear to do retries in firmwar

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Ric Wheeler
Alan wrote: When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable, as the drive itself has already done internal retries (libata uses the "with retry" ATA opcodes for this). This depends on the firmware. Some of the "raid firmware" drives don't appear to do retries in firmw

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Mark Lord
Douglas Gilbert wrote: Ric, Both ATA (ATA8-ACS) and SCSI (SBC-3) have recently added command support to flag a block as "uncorrectable". There is no need to send bad "long" data to it and suppress the disk's automatic re-allocation logic. That'll be useful in a couple of years, once drives tha

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Douglas Gilbert
Ric Wheeler wrote: > > > Jeff Garzik wrote: >> Mark Lord wrote: >>> Eric D. Mudama wrote: Actually, it's possibly worse, since each failure in libata will generate 3-4 retries. With existing ATA error recovery in the drives, that's about 3 seconds per retry on average, or 12

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread James Bottomley
On Wed, 2007-01-31 at 10:13 -0500, Mark Lord wrote: > James Bottomley wrote: > > > > For the MD case, this is what REQ_FAILFAST is for. > I cannot find where SCSI honours that flag. James? Er, it's in scsi_error.c:scsi_decide_disposition(): maybe_retry: /* we requeue for retry be

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Mark Lord
Mark Lord wrote: James Bottomley wrote: For the MD case, this is what REQ_FAILFAST is for. I cannot find where SCSI honours that flag. James? Scratch that thought.. SCSI honours it in scsi_end_request(). But I'm not certain that the block layer handles it correctly, at least not in the 2.

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Mark Lord
James Bottomley wrote: For the MD case, this is what REQ_FAILFAST is for. I cannot find where SCSI honours that flag. James? And for that matter, even when I patch SCSI so that it *does* honour it, I don't actually see the flag making it into the SCSI layer from above. And I don't see where

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Alan
> When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable, > as the drive itself has already done internal retries (libata uses the > "with retry" ATA opcodes for this). This depends on the firmware. Some of the "raid firmware" drives don't appear to do retries in firmware. > But

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Mark Lord
Ric Wheeler wrote: Mark Lord wrote: Eric D. Mudama wrote: Actually, it's possibly worse, since each failure in libata will generate 3-4 retries. (note: libata does *not* generate retries for medium errors; the looping is driven by the SCSI mid-layer code). It really beats the alternative o

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Ric Wheeler
Jeff Garzik wrote: Mark Lord wrote: Eric D. Mudama wrote: Actually, it's possibly worse, since each failure in libata will generate 3-4 retries. With existing ATA error recovery in the drives, that's about 3 seconds per retry on average, or 12 seconds per failure. Multiply that by the n

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Jeff Garzik
Mark Lord wrote: Eric D. Mudama wrote: Actually, it's possibly worse, since each failure in libata will generate 3-4 retries. With existing ATA error recovery in the drives, that's about 3 seconds per retry on average, or 12 seconds per failure. Multiply that by the number of blocks past t

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-30 Thread Douglas Gilbert
Ric Wheeler wrote: > > > Mark Lord wrote: > >> Eric D. Mudama wrote: >> >>> >>> Actually, it's possibly worse, since each failure in libata will >>> generate 3-4 retries. With existing ATA error recovery in the >>> drives, that's about 3 seconds per retry on average, or 12 seconds >>> per failu

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-30 Thread James Bottomley
On Tue, 2007-01-30 at 22:20 -0500, Ric Wheeler wrote: > Mark Lord wrote: > > The number of retries is an entirely separate issue. > > If we really care about it, then we should fix SD_MAX_RETRIES. > > > > The current value of 5 is *way* too high. It should be zero or one. > > > > Cheers > > > I th

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-30 Thread Ric Wheeler
Mark Lord wrote: Eric D. Mudama wrote: Actually, it's possibly worse, since each failure in libata will generate 3-4 retries. With existing ATA error recovery in the drives, that's about 3 seconds per retry on average, or 12 seconds per failure. Multiply that by the number of blocks pa

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-30 Thread James Bottomley
First off, please send SCSI patches to the SCSI list: On Tue, 2007-01-30 at 19:47 -0500, Mark Lord wrote: > In ancient kernels, the SCSI disk code used to continue after > encountering a MEDIUM_ERROR. It would "complete" the good > sectors before the error, fail the bad sector/block, and then >

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-30 Thread Mark Lord
Eric D. Mudama wrote: Actually, it's possibly worse, since each failure in libata will generate 3-4 retries. With existing ATA error recovery in the drives, that's about 3 seconds per retry on average, or 12 seconds per failure. Multiply that by the number of blocks past the error to comple

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-30 Thread Mark Lord
James Bottomley wrote: First off, please send SCSI patches to the SCSI list: Fixed already, thanks! This patch fixes the behaviour to be similar to what we had originally. When a bad sector is encounted, SCSI will now work around it again, failing *only* the bad sector itself. Erm, but th

[PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-30 Thread Mark Lord
In ancient kernels, the SCSI disk code used to continue after encountering a MEDIUM_ERROR. It would "complete" the good sectors before the error, fail the bad sector/block, and then continue with the rest of the request. Kernels since about 2.6.16 or so have been broken in this regard. They "comp