date:20080115

Re: [RFC] A SCSI fault injection framework using SystemTap.

2008-01-15 Thread K.Tanaka

Matthew Wilcox wrote:
 On Tue, Jan 15, 2008 at 12:04:09PM +0900, K.Tanaka wrote:
 I would like to introduce a SCSI fault injection framework using SystemTap.

 Currently, kernel has Fault-injection framework and Faulty mode for md,
 which can also be used for testing the error handling. But, they could
 only produce fixed type of errors stochastically. In order to simulate
 more realistic scsi disk faults, I have created a  new flexible fault 
 injection
 framework using SystemTap.
 
 How does it compare to using scsi_debug, which I believe can do all of
 the above and more?
 

 Sorry for the lack of explanation.
 The new framework is supposed to be used by a userspace testing tool
 (such as a shell script). For the availability, this framework enables user to
 designate the inode number of the target file on the device to inject faults.
 On accessing the target file through page caches, a fault will be injected.
 Also, user can designate the logical block address as the target position
 of a fault injection.

-- 
-
Kenichi TANAKA| Open Source Software Platform Development Division
  | Computers Software Operations Unit, NEC Corporation
  | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [dm-devel] [RFC] A SCSI fault injection framework using SystemTap.

2008-01-15 Thread Alasdair G Kergon

On Tue, Jan 15, 2008 at 12:04:09PM +0900, K.Tanaka wrote:
   -dm-mirror's redundancy doesn't work. A read error from the disk consisting
the array will be directory passed to the userspace, without reading from
the other mirror.
(It turns out that this issue is a known issue, but the patch is not 
 merged.
 
 http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-raid1-handle-read-failures.patch)
 
It's in the queue for 2.6.25.

Alasdair
-- 
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

2008-01-15 Thread dean gaudet

On Mon, 14 Jan 2008, NeilBrown wrote:

 
 raid5's 'make_request' function calls generic_make_request on
 underlying devices and if we run out of stripe heads, it could end up
 waiting for one of those requests to complete.
 This is bad as recursive calls to generic_make_request go on a queue
 and are not even attempted until make_request completes.
 
 So: don't make any generic_make_request calls in raid5 make_request
 until all waiting has been done.  We do this by simply setting
 STRIPE_HANDLE instead of calling handle_stripe().
 
 If we need more stripe_heads, raid5d will get called to process the
 pending stripe_heads which will call generic_make_request from a
 different thread where no deadlock will happen.
 
 
 This change by itself causes a performance hit.  So add a change so
 that raid5_activate_delayed is only called at unplug time, never in
 raid5.  This seems to bring back the performance numbers.  Calling it
 in raid5d was sometimes too soon...
 
 Cc: Dan Williams [EMAIL PROTECTED]
 Signed-off-by: Neil Brown [EMAIL PROTECTED]

probably doesn't matter, but for the record:

Tested-by: dean gaudet [EMAIL PROTECTED]

this time i tested with internal and external bitmaps and it survived 8h 
and 14h resp. under the parallel tar workload i used to reproduce the 
hang.

btw this should probably be a candidate for 2.6.22 and .23 stable.

thanks
-dean
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

2008-01-15 Thread Andrew Morton

On Tue, 15 Jan 2008 21:01:17 -0800 (PST) dean gaudet [EMAIL PROTECTED] wrote:

 On Mon, 14 Jan 2008, NeilBrown wrote:
 
  
  raid5's 'make_request' function calls generic_make_request on
  underlying devices and if we run out of stripe heads, it could end up
  waiting for one of those requests to complete.
  This is bad as recursive calls to generic_make_request go on a queue
  and are not even attempted until make_request completes.
  
  So: don't make any generic_make_request calls in raid5 make_request
  until all waiting has been done.  We do this by simply setting
  STRIPE_HANDLE instead of calling handle_stripe().
  
  If we need more stripe_heads, raid5d will get called to process the
  pending stripe_heads which will call generic_make_request from a
  different thread where no deadlock will happen.
  
  
  This change by itself causes a performance hit.  So add a change so
  that raid5_activate_delayed is only called at unplug time, never in
  raid5.  This seems to bring back the performance numbers.  Calling it
  in raid5d was sometimes too soon...
  
  Cc: Dan Williams [EMAIL PROTECTED]
  Signed-off-by: Neil Brown [EMAIL PROTECTED]
 
 probably doesn't matter, but for the record:
 
 Tested-by: dean gaudet [EMAIL PROTECTED]
 
 this time i tested with internal and external bitmaps and it survived 8h 
 and 14h resp. under the parallel tar workload i used to reproduce the 
 hang.
 
 btw this should probably be a candidate for 2.6.22 and .23 stable.
 

hm, Neil said

  The first fixes a bug which could make it a candidate for 24-final. 
  However it is a deadlock that seems to occur very rarely, and has been in
  mainline since 2.6.22.  So letting it into one more release shouldn't be
  a big problem.  While the fix is fairly simple, it could have some
  unexpected consequences, so I'd rather go for the next cycle.

food fight!
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

2008-01-15 Thread dean gaudet

On Tue, 15 Jan 2008, Andrew Morton wrote:

 On Tue, 15 Jan 2008 21:01:17 -0800 (PST) dean gaudet [EMAIL PROTECTED] 
 wrote:
 
  On Mon, 14 Jan 2008, NeilBrown wrote:
  
   
   raid5's 'make_request' function calls generic_make_request on
   underlying devices and if we run out of stripe heads, it could end up
   waiting for one of those requests to complete.
   This is bad as recursive calls to generic_make_request go on a queue
   and are not even attempted until make_request completes.
   
   So: don't make any generic_make_request calls in raid5 make_request
   until all waiting has been done.  We do this by simply setting
   STRIPE_HANDLE instead of calling handle_stripe().
   
   If we need more stripe_heads, raid5d will get called to process the
   pending stripe_heads which will call generic_make_request from a
   different thread where no deadlock will happen.
   
   
   This change by itself causes a performance hit.  So add a change so
   that raid5_activate_delayed is only called at unplug time, never in
   raid5.  This seems to bring back the performance numbers.  Calling it
   in raid5d was sometimes too soon...
   
   Cc: Dan Williams [EMAIL PROTECTED]
   Signed-off-by: Neil Brown [EMAIL PROTECTED]
  
  probably doesn't matter, but for the record:
  
  Tested-by: dean gaudet [EMAIL PROTECTED]
  
  this time i tested with internal and external bitmaps and it survived 8h 
  and 14h resp. under the parallel tar workload i used to reproduce the 
  hang.
  
  btw this should probably be a candidate for 2.6.22 and .23 stable.
  
 
 hm, Neil said
 
   The first fixes a bug which could make it a candidate for 24-final. 
   However it is a deadlock that seems to occur very rarely, and has been in
   mainline since 2.6.22.  So letting it into one more release shouldn't be
   a big problem.  While the fix is fairly simple, it could have some
   unexpected consequences, so I'd rather go for the next cycle.
 
 food fight!
 

heheh.

it's really easy to reproduce the hang without the patch -- i could
hang the box in under 20 min on 2.6.22+ w/XFS and raid5 on 7x750GB.
i'll try with ext3... Dan's experiences suggest it won't happen with ext3
(or is even more rare), which would explain why this has is overall a
rare problem.

but it doesn't result in dataloss or permanent system hangups as long
as you can become root and raise the size of the stripe cache...

so OK i agree with Neil, let's test more... food fight over! :)

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

2008-01-15 Thread Dan Williams

 heheh.

 it's really easy to reproduce the hang without the patch -- i could
 hang the box in under 20 min on 2.6.22+ w/XFS and raid5 on 7x750GB.
 i'll try with ext3... Dan's experiences suggest it won't happen with ext3
 (or is even more rare), which would explain why this has is overall a
 rare problem.


Hmmm... how rare?

http://marc.info/?l=linux-kernelm=119461747005776w=2

There is nothing specific that prevents other filesystems from hitting
it, perhaps XFS is just better at submitting large i/o's.  -stable
should get some kind of treatment.  I'll take altered performance over
a hung system.

--
Dan
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

2008-01-15 Thread Andrew Morton

On Wed, 16 Jan 2008 00:09:31 -0700 Dan Williams [EMAIL PROTECTED] wrote:

  heheh.
 
  it's really easy to reproduce the hang without the patch -- i could
  hang the box in under 20 min on 2.6.22+ w/XFS and raid5 on 7x750GB.
  i'll try with ext3... Dan's experiences suggest it won't happen with ext3
  (or is even more rare), which would explain why this has is overall a
  rare problem.
 
 
 Hmmm... how rare?
 
 http://marc.info/?l=linux-kernelm=119461747005776w=2
 
 There is nothing specific that prevents other filesystems from hitting
 it, perhaps XFS is just better at submitting large i/o's.  -stable
 should get some kind of treatment.  I'll take altered performance over
 a hung system.

We can always target 2.6.25-rc1 then 2.6.24.1 if Neil is still feeling
wimpy.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] A SCSI fault injection framework using SystemTap.

Re: [dm-devel] [RFC] A SCSI fault injection framework using SystemTap.

Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

7 matches

Site Navigation

Mail list logo

Footer information