Re: mismatch_cnt != 0

2008-02-23 Thread Carlos Carvalho
Justin Piszcz ([EMAIL PROTECTED]) wrote on 23 February 2008 10:44: > > >On Sat, 23 Feb 2008, Justin Piszcz wrote: > >> >> >> On Sat, 23 Feb 2008, Michael Tokarev wrote: >> >>> Justin Piszcz wrote: Should I be worried? Fri Feb 22 20:00:05 EST 2008: Executing RAID health c

Re: 2.6.24-rc6 reproducible raid5 hang

2008-01-29 Thread Carlos Carvalho
Tim Southerwood ([EMAIL PROTECTED]) wrote on 28 January 2008 17:29: >Subtitle: Patch to mainline yet? > >Hi > >I don't see evidence of Neil's patch in 2.6.24, so I applied it by hand >on my server. I applied all 4 pending patches to .24. It's been better than .22 and .23... Unfortunately the

Re: 2.6.24-rc6 reproducible raid5 hang

2008-01-23 Thread Carlos Carvalho
Tim Southerwood ([EMAIL PROTECTED]) wrote on 23 January 2008 13:37: >Sorry if this breaks threaded mail readers, I only just subscribed to >the list so don;t have the original post to reply to. > >I believe I'm having the same problem. > >Regarding XFS on a raid5 md array: > >Kernels 2.6.2

Re: idle array consuming cpu ??!!

2008-01-22 Thread Carlos Carvalho
Bill Davidsen ([EMAIL PROTECTED]) wrote on 22 January 2008 17:53: >Carlos Carvalho wrote: >> Neil Brown ([EMAIL PROTECTED]) wrote on 21 January 2008 12:15: >> >On Sunday January 20, [EMAIL PROTECTED] wrote: >> >> A raid6 array with a spare and bitmap is

Re: One Large md or Many Smaller md for Better Peformance?

2008-01-21 Thread Carlos Carvalho
Moshe Yudkowsky ([EMAIL PROTECTED]) wrote on 20 January 2008 21:19: >Thanks for the tips, and in particular: > >Iustin Pop wrote: > >> - if you download torrents, fragmentation is a real problem, so use a >> filesystem that knows how to preallocate space (XFS and maybe ext4; >> for

Re: idle array consuming cpu ??!!

2008-01-21 Thread Carlos Carvalho
Neil Brown ([EMAIL PROTECTED]) wrote on 21 January 2008 12:15: >On Sunday January 20, [EMAIL PROTECTED] wrote: >> A raid6 array with a spare and bitmap is idle: not mounted and with no >> IO to it or any of its disks (obviously), as shown by iostat. However >> it's consuming cpu: since reboot i

Re: array doesn't run even with --force

2008-01-20 Thread Carlos Carvalho
Neil Brown ([EMAIL PROTECTED]) wrote on 21 January 2008 14:09: >As you note, sda4 says that it thinks slot 1 is still active/sync, but >it doesn't seem to know which device should go there either. >However that does indicate that slot 3 failed first and slot 1 failed >later. So if we have cand

Re: array doesn't run even with --force

2008-01-20 Thread Carlos Carvalho
Neil Brown ([EMAIL PROTECTED]) wrote on 21 January 2008 12:13: >On Sunday January 20, [EMAIL PROTECTED] wrote: >> I've got a raid5 array with 5 disks where 2 failed. The failures are >> occasional and only on a few sectors so I tried to assemble it with 4 >> disks anyway: >> >> # mdadm -A -f

Re: idle array consuming cpu ??!!

2008-01-20 Thread Carlos Carvalho
Neil Brown ([EMAIL PROTECTED]) wrote on 21 January 2008 12:15: >On Sunday January 20, [EMAIL PROTECTED] wrote: >> A raid6 array with a spare and bitmap is idle: not mounted and with no >> IO to it or any of its disks (obviously), as shown by iostat. However >> it's consuming cpu: since reboot i

array doesn't run even with --force

2008-01-20 Thread Carlos Carvalho
I've got a raid5 array with 5 disks where 2 failed. The failures are occasional and only on a few sectors so I tried to assemble it with 4 disks anyway: # mdadm -A -f -R /dev/md /dev/disk1 /dev/disk2 /dev/disk3 /dev/disk4 However mdadm complains that one of the disks has an out-of-date superblock

idle array consuming cpu ??!!

2008-01-20 Thread Carlos Carvalho
A raid6 array with a spare and bitmap is idle: not mounted and with no IO to it or any of its disks (obviously), as shown by iostat. However it's consuming cpu: since reboot it used about 11min in 24h, which is quite a lot even for a busy array (the cpus are fast). The array was cleanly shutdown so

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-08 Thread Carlos Carvalho
Jeff Lessem ([EMAIL PROTECTED]) wrote on 6 November 2007 22:00: >Dan Williams wrote: > > The following patch, also attached, cleans up cases where the code looks > > at sh->ops.pending when it should be looking at the consistent > > stack-based snapshot of the operations flags. > >I tried thi

minimum speed for raid5 check

2007-06-04 Thread Carlos Carvalho
I'd like the check speed in a raid5 array to have a minimum speed. I put the value at sync_speed_min but it only works when the array is idle. Is there a way to set a minimum check speed independent of the array load? Otherwise it'll never finish because the machine is always reading from the disks

Re: data recovery on raid5

2006-04-22 Thread Carlos Carvalho
Molle Bestefich ([EMAIL PROTECTED]) wrote on 22 April 2006 23:17: >Jonathan wrote: >> thank you thank you thank you thank you thank you thank you Nice job Molle! >np. >(wait, does that mean I won't get my money? ;-)) At least you showed someone a new meaning of "support"... Is it my turn no

Re: data recovery on raid5

2006-04-22 Thread Carlos Carvalho
Jonathan ([EMAIL PROTECTED]) wrote on 22 April 2006 13:07: >I was already terrified of screwing things up -- now I'm afraid of >making things worse > >based on what was posted before is this a sensible thing to try? > >mdadm -C /dev/md0 -c 32 -n 4 -l 5 missing /dev/etherd/e0.[023] > >Is wh

disks becoming slow but not explicitly failing anyone?

2006-04-22 Thread Carlos Carvalho
We've been hit by a strange problem for about 9 months already. Our main server suddenly becomes very unresponsive, the load skyrockets and if demand is high enough it collapses. top shows many processes stuck in D state. There are no raid or disk error messages, either in the console or logs. The

Re: Problem with 5disk RAID5 array - two drives lost

2006-04-22 Thread Carlos Carvalho
Molle Bestefich ([EMAIL PROTECTED]) wrote on 22 April 2006 05:54: >Tim Bostrom wrote: >> raid5: Disk failure on hdf1, disabling device. > >MD doesn't like to find errors when it's rebuilding. >It will kick that disk off the array, which will cause MD to return >crap (instead of stopping the a

Re: replace disk in raid5 without linux noticing?

2006-04-22 Thread Carlos Carvalho
Martin Cracauer (cracauer@cons.org) wrote on 22 April 2006 11:08: >> stop the array >> dd warning disk => new one >> remove warning disk >> assemble the array again with the new disk >> >> The inconvenience is that you don't have the array during the copy. > >Stopping the array and restart

Re: help wanted - 6-disk raid5 borked: _ _ U U U U

2006-04-21 Thread Carlos Carvalho
Gabor Gombas ([EMAIL PROTECTED]) wrote on 20 April 2006 16:35: >On Mon, Apr 17, 2006 at 09:30:32AM +1000, Neil Brown wrote: > >> It is arguable that for a read error on a degraded raid5, that may not >> be the best thing to do, but I'm not completely convinced. > >My opinion would be that in

Re: replace disk in raid5 without linux noticing?

2006-04-21 Thread Carlos Carvalho
Dexter Filmore ([EMAIL PROTECTED]) wrote on 21 April 2006 20:23: >Am Mittwoch, 19. April 2006 18:31 schrieb Shai: >> On 4/19/06, Dexter Filmore <[EMAIL PROTECTED]> wrote: >> > Let's say a disk in an array starts yielding smart errors but is still >> > functional. >> > So instead of waiting for

Re: help wanted - 6-disk raid5 borked: _ _ U U U U

2006-04-16 Thread Carlos Carvalho
CaT ([EMAIL PROTECTED]) wrote on 17 April 2006 10:25: >On Sun, Apr 16, 2006 at 08:46:52PM -0300, Carlos Carvalho wrote: >> Neil Brown ([EMAIL PROTECTED]) wrote on 17 April 2006 09:30: >> >The easiest thing to do when you get an error on a drive is to kick >> >t

Re: help wanted - 6-disk raid5 borked: _ _ U U U U

2006-04-16 Thread Carlos Carvalho
Molle Bestefich ([EMAIL PROTECTED]) wrote on 17 April 2006 02:21: >Neil Brown wrote: >> use --assemble --force > ># mdadm --assemble --force /dev/md1 >mdadm: forcing event count in /dev/sda1(0) from 163362 upto 163368 >mdadm: /dev/md1 has been started with 5 drives (out of 6). > >Oops, only

Re: help wanted - 6-disk raid5 borked: _ _ U U U U

2006-04-16 Thread Carlos Carvalho
Neil Brown ([EMAIL PROTECTED]) wrote on 17 April 2006 09:30: >The easiest thing to do when you get an error on a drive is to kick >the drive from the array, so that is what the code always did, and >still does in many cases. >It is arguable that for a read error on a degraded raid5, that may no

Re: excessive IO reported by kernel 2.6

2006-04-15 Thread Carlos Carvalho
Carlos Carvalho ([EMAIL PROTECTED]) wrote on 15 April 2006 15:44: >Sorry, forgot to say this... > >Mark Hahn ([EMAIL PROTECTED]) wrote on 15 April 2006 13:06: > >> Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > >> md0 14,4

Re: excessive IO reported by kernel 2.6

2006-04-15 Thread Carlos Carvalho
Sorry, forgot to say this... Mark Hahn ([EMAIL PROTECTED]) wrote on 15 April 2006 13:06: >> Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >> md0 14,4657,17 104,96 18050544 33136016 >> md2 71,53 145,46 519,37 4592

excessive IO reported by kernel 2.6

2006-04-15 Thread Carlos Carvalho
I've just switched our main server from kernel 2.4 to 2.6. Now I have more info displayed by iostat, and the numbers look strange: Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn md0 14,4657,17 104,96 18050544 33136016 md2 71,53

Re: blog entry on RAID limitation

2006-01-21 Thread Carlos Carvalho
Neil Brown ([EMAIL PROTECTED]) wrote on 18 January 2006 09:47: >On Tuesday January 17, [EMAIL PROTECTED] wrote: >> Neil Brown wrote: >> > In general, I think increasing the connection between the filesystem >> > and the volume manager/virtual storage is a good idea. Well, I agree in principle

Re: blog entry on RAID limitation

2006-01-21 Thread Carlos Carvalho
Jeff Breidenbach (jeff@jab.org) wrote on 17 January 2006 00:45: >Is this a real issue or ignorable Sun propoganda? > >-Original Message- >From: I-Gene Leong >Subject: RE: [colo] OT: Server Hardware Recommendations >Date: Mon, 16 Jan 2006 14:10:33 -0800 > >There was an interesting bl

Re: [PATCH md 013 of 18] Improve handing of read errors with raid6

2005-11-30 Thread Carlos Carvalho
NeilBrown ([EMAIL PROTECTED]) wrote on 28 November 2005 10:40: >This is a simple port of match functionality across from raid5. >If we get a read error, we don't kick the drive straight away, but >try to over-write with good data first. Does it really mean that this funcionality is already avai

no reconstruction in 2.6.14.2?

2005-11-19 Thread Carlos Carvalho
I've just installed a new server with a 5-disk raid5 and kernel 2.6.14.2. To check something I did a hard reset without shutdown and on reboot the machine didn't do any automatic resync; all arrays are shown clean. Is this some automatic intent-logging or another 2.6 feature (the machine had been

Re: raid5 write performance

2005-11-19 Thread Carlos Carvalho
Neil Brown ([EMAIL PROTECTED]) wrote on 19 November 2005 16:54: >There are two solutions to this silent corruption problem (other than >'ignore it and hope it doesn't bite' which is a fair widely used >solution, and I haven't seen any bite marks myself). It happened to me several years ago when

raid5 reliability (was raid5 write performance)

2005-11-19 Thread Carlos Carvalho
Guy ([EMAIL PROTECTED]) wrote on 19 November 2005 00:56: >Assume a single stripe has data for 2 different files (A and B). A disk has >failed. The file system writes a 4K chunk of data to file A. The parity >gets updated, but not the data. Or the data gets updated but not the >parity. The

Re: help!

2005-11-14 Thread Carlos Carvalho
Shane Bishop ([EMAIL PROTECTED]) wrote on 14 November 2005 11:20: >I had an mdadm device running fine, and had created my own scripts for >shutting it down and such. I upgraded my distro, and all of a sudden it >decided to start initializing md devices on it's own, which include one >that I

doubts about intent-logging

2005-11-11 Thread Carlos Carvalho
If I understand the Compendium (also known as mdadm manpage :-) ) the intent-logging bitmap is stored near the superblock when the name "internal" is specified. Does this mean that it'll be in the 128KB area of the superblock? If so, what happens if there isn't enough space? This is likely for medi

Re: intent-logging, not synchronizing written blocks?

2005-11-09 Thread Carlos Carvalho
Neil Brown ([EMAIL PROTECTED]) wrote on 10 November 2005 09:50: >> Is intent-logging for raid5 already in mainline or only in -mm? I >> looked for it in 2.6.14 and found nothing... > >raid5 is in 2.6.14. raid6 and raid10 should be in 2.6.16. >What did you look for? Some doc, or compilation o

intent-logging, not synchronizing written blocks?

2005-11-09 Thread Carlos Carvalho
Hi, Is intent-logging for raid5 already in mainline or only in -mm? I looked for it in 2.6.14 and found nothing... It seems that intent-logging accelerates synchronization by skipping the blocks that have not changed since the array went out of sync. I'm not sure but if this is correct I thought

Re: what happens when all but one disk fail in raid5?

2001-03-01 Thread Carlos Carvalho
Sorry for the delay in replying :-( Corin Hartland-Swann ([EMAIL PROTECTED]) wrote on 22 February 2001 16:36: >On Thu, 22 Feb 2001, Carlos Carvalho wrote: >> What happens if I have a raid5 array and, instead of the usual >> situation of one disk failing, I have the opposite,

what happens when all but one disk fail in raid5?

2001-02-22 Thread Carlos Carvalho
What happens if I have a raid5 array and, instead of the usual situation of one disk failing, I have the opposite, all disks stop except one? This may seem strange, but may happen when you have one power supply feeding all disks but one, and another power supply feeding the last disk. If the powe

what's the most efficient chunk size in 2.4?

2001-01-15 Thread Carlos Carvalho
I'm preparing to move to 2.4, and I use raid5. What's the most efficient chunk size? Neil's graphics show that 128k or 256k are necessary, even with his patches, to get high performance out of 2.4. Is this still true? He's made several changes after the measurements posted on the web. - To unsubsc

Re: PATCH - raid5 in 2.4.0-test13 - substantial rewrite with substantial performance increase

2000-12-22 Thread Carlos Carvalho
Neil Brown ([EMAIL PROTECTED]) wrote on 22 December 2000 09:42: >> Interesting. What's the performance increase? The previous graphs in >> you site show that 2.4 was already better than 2.2, and the only point >> to be improved was reducing the difference in reading between raid 0 >> and raid

Re: PATCH - raid5 in 2.4.0-test13 - substantial rewrite with substantial performance increase

2000-12-21 Thread Carlos Carvalho
Neil Brown ([EMAIL PROTECTED]) wrote on 21 December 2000 15:35: > There is more work to be done to bring raid5 performance upto the > level of 2.2+mingos-patches, but this is a first, large, step on the > way. Interesting. What's the performance increase? The previous graphs in you site sh

Re: www.linuxraid.org

2000-12-19 Thread Carlos Carvalho
You're not an "off-the shelf-lawyer" :-) If you need witnesses for the salary negotiation suit I volunteer :-) However, I think the increase of kernel parameters is much more easily done via settings in /proc/sys at run time than by changing the source code. - To unsubscribe from this list: send

Re: we are finding that parity writes are half of all writes when writing 50mb files

2000-11-27 Thread Carlos Carvalho
Jakob Østergaard ([EMAIL PROTECTED]) wrote on 27 November 2000 18:18: >On Mon, Nov 27, 2000 at 02:24:03PM -0700, Hans Reiser wrote: >> using reiserfs over raid5 with 5 disks. This is unnecessarily >> suboptimal, it should be that parity writes are 20% of the disk >> bandwidth. Comments? > >I

Re: Broken superblocks and --force

2000-10-08 Thread Carlos Carvalho
Corin Hartland-Swann ([EMAIL PROTECTED]) wrote on 6 October 2000 19:01: >Yes - but I was curious as to whether this kernel memory bug had been >fixed yet, or whether it was still happening. I'd like to upgrade from >2.2.14 (because of the security hole) but the do_try_to_free_pages() bug >is I