[zfs-discuss] RAID Failure Calculator (for 8x 2TB RAIDZ)
I require a new high capacity 8 disk zpool. The disks I will be purchasing (Samsung or Hitachi) have an Error Rate (non-recoverable, bits read) of 1 in 10^14 and will be 2TB. I'm staying clear of WD because they have the new 2048b sectors which don't play nice with ZFS at the moment. My question is, how do I determine which of the following zpool and vdev configuration I should run to maximize space whilst mitigating rebuild failure risk? 1. 2x RAIDZ(3+1) vdev 2. 1x RAIDZ(7+1) vdev 3. 1x RAIDZ2(7+1) vdev I just want to prove I shouldn't run a plain old RAID5 (RAIDZ) with 8x 2TB disks. Cheers ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RAID Failure Calculator (for 8x 2TB RAIDZ)
Yes I did mean 6+2, Thank you for fixing the typo. I'm actually more leaning towards running a simple 7+1 RAIDZ1. Running this with 1TB is not a problem but I just wanted to investigate at what TB size the scales would tip. I understand RAIDZ2 protects against failures during a rebuild process. Currently, my RAIDZ1 takes 24 hours to rebuild a failed disk, so with 2TB disks and worse case assuming this is 2 days this is my 'exposure' time. For example, I would hazard a confident guess that 7+1 RAIDZ1 with 6TB drives wouldn't be a smart idea. I'm just trying to extrapolate down. I will be running hot (or maybe cold) spare. So I don't need to factor in Time it takes for a manufacture to replace the drive. On Mon, Feb 7, 2011 at 2:48 PM, Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Matthew Angelo My question is, how do I determine which of the following zpool and vdev configuration I should run to maximize space whilst mitigating rebuild failure risk? 1. 2x RAIDZ(3+1) vdev 2. 1x RAIDZ(7+1) vdev 3. 1x RAIDZ2(6+2) vdev I just want to prove I shouldn't run a plain old RAID5 (RAIDZ) with 8x 2TB disks. (Corrected type-o, 6+2 for you). Sounds like you made up your mind already. Nothing wrong with that. You are apparently uncomfortable running with only 1 disk worth of redundancy. There is nothing fundamentally wrong with the raidz1 configuration, but the probability of failure is obviously higher. Question is how do you calculate the probability? Because if we're talking abou 5e-21 versus 3e-19 then you probably don't care about the difference... They're both essentially zero probability... Well... There's no good answer to that. With the cited probability of bit error rate, you're just representing the probability of a bit error. You're not representing the probability of a failed drive. And you're not representing the probability of a drive failure within a specified time window. What you really care about is the probability of two drives (or 3 drives) failing concurrently... In which case, you need to model the probability of any one drive failing within a specified time window. And even if you want to model that probability, in reality it's not linear. The probability of a drive failing between 1yr and 1yr+3hrs is smaller than the probability of the drive failing between 3yr and 3yr+3hrs. Because after 3yrs, the failure rate will be higher. So after 3 yrs, the probability of multiple simultaneous failures is higher. I recently saw some seagate data sheets which specified the annual disk failure rate to be 0.3%. Again, this is a linear model, representing a nonlinear reality. Suppose one disk fails... How many weeks does it take to get a replacement onsite under the 3yr limited mail-in warranty? But then again after 3 years, you're probably considering this your antique hardware, and all the stuff you care about is on a newer server. Etc. There's no good answer to your question. You are obviously uncomfortable with a single disk worth of redundancy. Go with your gut. Sleep well at night. It only costs you $100. You probably have a cell phone with no backups worth more than that in your pocket right now. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating zpool to new drives with 4K Sectors
Hi Benji, I did take a read of your blog before posting this. But it didn't have the exact answer I was looking for. OS Version is Solaris 10 U9 x86. Your blog was highly informative, but didn't say if you can zfs replace into a 4kb drive - it was more discussing the ability to detect the WD drives as 4kb during a zpool create/ I was hoping to do this without creating a new device, although I can understand the technical reason why this isn't possible. Thanks On Sat, Jan 8, 2011 at 12:33 AM, Benji b.robich...@gmail.com wrote: I have recently done this. See here for more details: http://www.solarismen.de/archives/5-Solaris-and-the-new-4K-Sector-Disks-e.g.-WDxxEARS-Part-2.html) What version are you running? There's a compiled version of the modified zpool command that will create pools that are 4K aligned somewhere, for OpenIndiana snv_147 zpool 28 I think. I have also compiled an OpenSolaris snv_134 zpool 22 version. If you want it I can send it to you. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Migrating zpool to new drives with 4K Sectors
Hi ZFS Discuss, I have a 8x 1TB RAIDZ running on Samsung 1TB 5400rpm drives with 512b sectors. I will be replacing all of these with 8x Western Digital 2TB drives with support for 4K sectors. The replacement plan will be to swap out each of the 8 drives until all are replaced and the new size (~16TB) is available with a `zfs scrub`. My question is, how do I do this and also factor in the new 4k sector size? or should I find a 2TB drive that still uses 512b sectors? Thanks ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Attaching a mirror to a mirror
Assuming I have a zpool which consists of a simple 2 disk mirror. How do I attach a third disk (disk3) to this zpool to mirror the existing data? Then split this mirror and remove disk0 and disk1, leaving a single disk zpool which consist of the new disk3. AKA. Online data migration. [root]# zpool status -v pool: apps state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM apps ONLINE 0 0 0 mirror ONLINE 0 0 0 /root/zfs/disk0 ONLINE 0 0 0 /root/zfs/disk1 ONLINE 0 0 0 errors: No known data errors The use case here is, we've implemented new Storage. The new (3rd) LUN is a RAID10 Hitachi SAN with the existing mirror being local SAS Disks. Back in the VxVM world, this would be done by mirroring the dg then splitting the mirror. I understand we are moving away from a ZFS mirror to a single stripe. Thanks ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Attaching a mirror to a mirror
Hi Francis, Thanks for confirming. That did the trick. I kept thinking I had to mirror at the highest level (zpool), then split. I actually did it in one less step than you mention by using replace instead of attach then detach but what you said is 100% correct. zpool replace /root/zfs/disk0 /root/zfs/disk3 zpool detach /root/zfs/disk1 Thanks again! On Thu, Mar 26, 2009 at 7:00 PM, Francois Napoleoni francois.napole...@sun.com wrote: Hi Matthew, Just attach disk3 to existing mirrored tlv Wait for resilvering to complete Dettach disk0 and disk1 This will leave you with only disk3 in your pool. You will loose ZFS redundancy fancy features (self healing, ...). # zpool create test mirror /export/disk0 /export/disk1 # zpool status pool: test state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 mirror ONLINE 0 0 0 /export/disk0 ONLINE 0 0 0 /export/disk1 ONLINE 0 0 0 errors: No known data errors # zpool attach test /export/disk1 /export/disk3 # zpool status pool: test state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Thu Mar 26 19:55:24 2009 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 mirror ONLINE 0 0 0 /export/disk0 ONLINE 0 0 0 /export/disk1 ONLINE 0 0 0 /export/disk3 ONLINE 0 0 0 71.5K resilvered # zpool detach test /export/disk0 # zpool detach test /export/disk1 # zpool status pool: test state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Thu Mar 26 19:55:24 2009 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 /export/disk3 ONLINE 0 0 0 71.5K resilvered errors: No known data errors F. On 03/26/09 08:20, Matthew Angelo wrote: Assuming I have a zpool which consists of a simple 2 disk mirror. How do I attach a third disk (disk3) to this zpool to mirror the existing data? Then split this mirror and remove disk0 and disk1, leaving a single disk zpool which consist of the new disk3. AKA. Online data migration. [root]# zpool status -v pool: apps state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM apps ONLINE 0 0 0 mirror ONLINE 0 0 0 /root/zfs/disk0 ONLINE 0 0 0 /root/zfs/disk1 ONLINE 0 0 0 errors: No known data errors The use case here is, we've implemented new Storage. The new (3rd) LUN is a RAID10 Hitachi SAN with the existing mirror being local SAS Disks. Back in the VxVM world, this would be done by mirroring the dg then splitting the mirror. I understand we are moving away from a ZFS mirror to a single stripe. Thanks ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Recovering data from a corrupted zpool
Hi there, Is there a way to get as much data as possible off an existing slightly corrupted zpool? I have a 2 disk stripe which I'm moving to new storage. I will be moving it to a ZFS Mirror, however at the moment I'm having problems with ZFS Panic'ing the system during a send | recv. I don't know exactly how much data is valid. Everything appears to run as expected and applications aren't crashing. Doing an $( ls -lR | grep -i IO Error ) returns roughly 10-15 files which are affected.Luckily, these files ls is returning aren't super critical. Is it possible to tell ZFS to do a emergency copy as much valid data off this file system? I've tried disabling checkums on the corrupted source zpool. But even still, once ZFS runs into an error the zpool is FAULTED and the kernel panic's and the system crashes. Is it possible to tell the zpool to ignore any errors and continue without faulting the zpool? We have a backup of the data, which is 2 months old. Is it slightly possible to bring this backup online, and 'sync as much as it can' between the two volumes? Could this just be a rsync job? Thanks [root]# zpool status -v apps pool: apps state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAMESTATE READ WRITE CKSUM appsONLINE 0 0 120 c1t1d0ONLINE 0 0 60 c1t2d0ONLINE 0 0 0 c1t3d0ONLINE 0 0 60 errors: Permanent errors have been detected in the following files: apps:0x0 0x1d2:0x0 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Diagnosing CKSUM Errors
Hello I found myself in a curious situation regarding the state of a zpool inside a VMWare Guest. I've run into CKSUM errors on the below infastructure stack. Hitachi (HDS) 9570V SAN, FC Disks SUN X4600 M2 (16 Core, 32GB Memory) VMWare ESXi 3.5 U3 Single Extended Datastore, 4x 35GB FC LUNs. Solaris 10 u6 x86 Guest OS A striped zpool on the Solaris Guest is starting to show some CKSUM Errors. This is very surprisingly by itself because of the Enterprise hardware we're dealing with, but assuming we can ignore why these errors are happening for the time being: How do I diagnose the state of the 'apps' zpool? 1. Why is ZFS showing 0x0 instead of an actual file(s)? 2. How do I see where/which files these CKSUM errors are affecting? I'm not seeing *any* errors or warnings in messages. Any thoughts? # zpool status -v pool: apps state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAMESTATE READ WRITE CKSUM appsONLINE 0 028 c1t1d0ONLINE 0 014 c1t2d0ONLINE 0 0 0 c1t3d0ONLINE 0 014 errors: Permanent errors have been detected in the following files: apps:0x0 pool: rpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 errors: No known data errors ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Recovery after SAN Corruption
Hello, We recently had SAN corruption (hard power outage), and we lost a few transaction that were waiting to be written to real disk. The end result as we all know is CKSUM errors on the zpool from a scrub, and we also had a few corrupted files reported by ZFS. My question is, what is the proper way to recover from this? Create a new ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool CKSUM errors since drive replace
Another update: Last night, already reading many blogs about si3124 chipset problems with Solaris 10 I applied the Patch Id: 138053-02 which updates si3124 from 1.2 to 1.4 and fixes numerous performance and interrupt related bugs. And it appears to have helped.Below is the zpool scrub after the new driver, but I'm still not confident on the exact problem. # zpool status -v pool: rzdata state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed with 1 errors on Wed Oct 29 05:32:16 2008 config: NAMESTATE READ WRITE CKSUM rzdata ONLINE 0 0 2 raidz1ONLINE 0 0 2 c3t0d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 3 c4t3d0 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: /rzdata/downloads/linux/ubuntu-8.04.1-desktop-i386.iso It still didn't clear the errored file I have, which I'm curious about considering it's a RAIDZ. On Mon, Oct 27, 2008 at 2:57 PM, Matthew Angelo [EMAIL PROTECTED] wrote: Another update. Weekly cron kicked in again this week, but this time is failed with a lot of CKSUM errors and now also complained about corrupted files. The single file it complained about is a new one I recently copied into it. I'm stumped with this. How do I verify the x86 hardware under the OS? I've run Memtest86 and it ran overnight without a problem. Tonight I will be moving back to my old Motherboard/CPU/Memory. Hopefully this is a simple hardware problems. But the question I'd like to pose to everyone is, how can we validate our x86 hardware? On Tue, Oct 21, 2008 at 8:23 AM, David Turnbull [EMAIL PROTECTED]wrote: I don't think it's normal, no.. it seems to occur when the resilver is interrupted and gets marked as done prematurely? On 20/10/2008, at 12:28 PM, Matthew Angelo wrote: Hi David, Thanks for the additional input. This is the reason why I thought I'd start a thread about it. To continue my original topic, I have additional information to add. After last weeks initial replace/resilver/scrub -- my weekly cron scrub (runs Sunday morning) kicked off and all CKSUM errors have now cleared: pool: rzdata state: ONLINE scrub: scrub completed with 0 errors on Mon Oct 20 09:41:31 2008 config: NAMESTATE READ WRITE CKSUM rzdata ONLINE 0 0 0 raidz1ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 errors: No known data errors Which requires me to ask -- is it standard for high Checksum (CKSUM) errors on a zpool when you replace a failed disk after it has resilvered? Is there anything I can feedback into the zfs community on this matter? Matt On Sun, Oct 19, 2008 at 9:26 AM, David Turnbull [EMAIL PROTECTED] wrote: Hi Matthew. I had a similar problem occur last week. One disk in the raidz had the first 4GB zeroed out (manually) before we then offlined it and replaced with a new disk. High checksum errors were occuring on the partially-zeroed disk, as you'd expect, but when the new disk was inserted, checksum errors occured on all disks. Not sure how relevant this is to your particular situation, but unexpected checksum errors on known-good hardware has definitely happened to me as well. -- Dave On 15/10/2008, at 10:50 PM, Matthew Angelo wrote: The original disk failure was very explicit. High Read Errors and errors inside /var/adm/messages. When I replaced the disk however, these have all gone and the resilver was okay. I am not seeing any read/write or /var/adm/messages errors -- but for some reason I am seeing errors inside the CKSUM column which I've never seen before. I hope you're right and it's a simple memory corruption problem. I will be running memtest86 overnight and hopefully it fails so we can rule our zfs. On Wed, Oct 15, 2008 at 11:48 AM, Mark J Musante [EMAIL PROTECTED] wrote: So this is where I stand. I'd like to ask zfs-discuss if they've seen any ZIL/Replay style bugs associated with u3/u5 x86? Again, I'm confident in my hardware
Re: [zfs-discuss] zpool CKSUM errors since drive replace
The original disk failure was very explicit. High Read Errors and errors inside /var/adm/messages. When I replaced the disk however, these have all gone and the resilver was okay. I am not seeing any read/write or /var/adm/messages errors -- but for some reason I am seeing errors inside the CKSUM column which I've never seen before. I hope you're right and it's a simple memory corruption problem. I will be running memtest86 overnight and hopefully it fails so we can rule our zfs. On Wed, Oct 15, 2008 at 11:48 AM, Mark J Musante [EMAIL PROTECTED]wrote: So this is where I stand. I'd like to ask zfs-discuss if they've seen any ZIL/Replay style bugs associated with u3/u5 x86? Again, I'm confident in my hardware, and /var/adm/messages is showing no warnings/errors. Are you absolutely sure the hardware is OK? Is there another disk you can test in its place? If I read your post correctly, your first disk was having errors logged against it, and now the second disk -- plugged into the same port -- is also logging errors. This seems to me more like the port is bad. Is there a third disk you can try in that same port? I have a hard time seeing that this could be a zfs bug - I've been doing lots of testing on u5 and the only time I see checksum errors is when I deliberately induce them. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool CKSUM errors since drive replace
After performing the following steps in exact order, I am now seeing CKSUM errors in my zpool. I've never seen any Checksum errors before in the zpool. 1. Performing running setup (RAIDZ 7D+1P) - 8x 1TB. Solaris 10 Update 3 x86. 2. Disk 6 (c6t2d0) was dying, $(zpool status) read errors, and device errors in /var/adm/messages. 3. Additional to replacing this disk, I thought I would give myself a challenge and upgrade to Solaris 10 and change my CPU/Motherboard. 3.1 CPU went to AthlonXP 3500+ from an Athlon FX-51 3.2 Motherboard went to Asus A8N-SLI Premium from Asus SK8N. 3.3 Memory stayed the same at 2GB ECC DRR (all other components identical). 3.4 And finally, I replaced the failed Disk 6. 4. Solaris 10 U5 x86 Install was fine without a problem. zpool imported fine (obviously DEGRADED). 5. zpool replace worked without a problem and it resilvered with 0 read, write or cksum errors. 6. After zpool replace, zfs recommended I run zfs upgrade to upgrade from zfs3 to zfs4, which I have done. This is where the problem starts to appear. The Upgrade was fine, however immediately after the upgrade I ran a scrub and I noticed a very high number of cksum errors on the newly replaced disk 6 (now c4t2d0, previously before reinstall c6t2d0). Here is the progress of the scrub and you can see how the cksum is quickly, and constantly increasing: [/root][root]# date Fri Oct 10 00:19:16 EST 2008 [root][root]# zpool status -v pool: rzdata state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub in progress, 7.34% done, 6h10m to go config: NAMESTATE READ WRITE CKSUM rzdata ONLINE 0 0 0 raidz1ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 390 c4t3d0 ONLINE 0 0 0 errors: No known data errors [/root][root]# date Fri Oct 10 00:23:12 EST 2008 [root][root]# zpool status -v pool: rzdata state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub in progress, 8.01% done, 6h6m to go config: NAMESTATE READ WRITE CKSUM rzdata ONLINE 0 0 0 raidz1ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 1 c3t3d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 2 c4t2d0 ONLINE 0 0 768 c4t3d0 ONLINE 0 0 0 [/root][root]# date Fri Oct 10 00:29:44 EST 2008 [/root][root]# zpool status -v pool: rzdata state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub in progress, 9.88% done, 5h57m to go config: NAMESTATE READ WRITE CKSUM rzdata ONLINE 0 0 0 raidz1ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 2 c3t3d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 2 c4t2d0 ONLINE 0 0 931 c4t3d0 ONLINE 0 0 1 It eventually finished with 6.4K CKSUM errors against c4t2d0 and an average of sub 5 errors on the remaining disks. I was not (and still not) convinced it's a physical hardware problem and my initial thoughts was that there is/was(?) a bug with zfs and zpool upgrade a mounted and running zpool. So to be pedantic, I rebooted the server, and initiated another scrub. This is the outcome of this scrub: [/root][root]# zpool status -v pool: rzdata state: ONLINE status: One or more devices has experienced an