Re: [zfs-discuss] Zpool import not working - I broke my pool...
Hmm... got a bit more information for you to add to that bug I think. Zpool import also doesn't work if you have mirrored log devices and either one of them is offline. I created two ramdisks with: # ramdiskadm -a rc-pool-zil-1 256m # ramdiskadm -a rc-pool-zil-2 256m And added them to the pool with: # zpool add rc-pool log mirror /dev/ramdisk/rc-pool-zil-1 /dev/ramdisk/rc-pool-zil-2 I can reboot fine, the pool imports ok without the ZIL and I have a script that recreates the ramdisks and adds them back to the pool:#!/sbin/shstate=$1case $state in'start') echo 'Starting Ramdisks' /usr/sbin/ramdiskadm -a rc-pool-zil-1 256m /usr/sbin/ramdiskadm -a rc-pool-zil-2 256m echo 'Attaching to ZFS ZIL' /usr/sbin/zpool replace test /dev/ramdisk/rc-pool-zil-1 /usr/sbin/zpool replace test /dev/ramdisk/rc-pool-zil-2 ;;'stop') ;;esac However, if I export the pool, and delete one ramdisk to check that the mirroring works fine, the import fails: # zpool export rc-pool # ramdiskadm -d rc-pool-zil-1 # zpool import rc-pool cannot import 'rc-pool': one or more devices is currently unavailable Ross Date: Mon, 4 Aug 2008 10:42:43 -0600 From: [EMAIL PROTECTED] Subject: Re: [zfs-discuss] Zpool import not working - I broke my pool... To: [EMAIL PROTECTED]; [EMAIL PROTECTED] CC: zfs-discuss@opensolaris.orgRichard Elling wrote: Ross wrote: I'm trying to import a pool I just exported but I can't, even -f doesn't help. Every time I try I'm getting an error: cannot import 'rc-pool': one or more devices is currently unavailable Now I suspect the reason it's not happy is that the pool used to have a ZIL :) Correct. What you want is CR 6707530, log device failure needs some work http://bugs.opensolaris.org/view_bug.do?bug_id=6707530 which Neil has been working on, scheduled for b96. Actually no. That CR mentioned the problem and talks about splitting out the bug, as it's really a separate problem. I've just done that and here's the new CR which probably won't be visible immediately to you: 6733267 Allow a pool to be imported with a missing slog Here's the Description: --- This CR is being broken out from 6707530 log device failure needs some work When Separate Intent logs (slogs) were designed they were given equal status in the pool device tree. This was because they can contain committed changes to the pool. So if one is missing it is assumed to be important to the integrity of the application(s) that wanted the data committed synchronously, and thus a pool cannot be imported with a missing slog. However, we do allow a pool to be missing a slog on boot up if it's in the /etc/zfs/zpool.cache file. So this sends a mixed message. We should allow a pool to be imported without a slog if -f is used and to not import without -f but perhaps with a better error message. It's the guidsum check that actually rejects imports with missing devices. We could have a separate guidsum for the main pool devices (non slog/cache). --- _ Get Hotmail on your mobile from Vodafone http://clk.atdmt.com/UKM/go/107571435/direct/01/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] more ZFS recovery
Hi, Have a problem with a ZFS on a single device, this device is 48 1T SATA drives presented as a 42T LUN via hardware RAID 6 on a SAS bus which had a ZFS on it as a single device. There was a problem with the SAS bus which caused various errors including the inevitable kernel panic, the thing came back up with 3 out of 4 zfs mounted. I've tried reading the partition table with format, works fine, also can dd the first 100G from the device quite happily so the communication issue appears resolved however the device just won't mount. Googling around I see that ZFS does have features designed to reduce the impact of corruption at a particular point, multiple meta data copies and so on, however commands to help me tidy up a zfs will only run once the thing has been mounted. Would be grateful for any ideas, relevant output here: [EMAIL PROTECTED]:~# zpool import pool: content id: 14205780542041739352 state: FAULTED status: The pool metadata is corrupted. action: The pool cannot be imported due to damaged devices or data. The pool may be active on on another system, but can be imported using the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-72 config: content FAULTED corrupted data c2t9d0ONLINE [EMAIL PROTECTED]:~# zpool import content cannot import 'content': pool may be in use from other system use '-f' to import anyway [EMAIL PROTECTED]:~# zpool import -f content cannot import 'content': I/O error [EMAIL PROTECTED]:~# uname -a SunOS cs3.kw 5.10 Generic_127127-11 sun4v sparc SUNW,Sun-Fire-T200 Thanks -- Tom // www.portfast.co.uk -- internet services and consultancy // hosting from 1.65 per domain ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] OpenSolaris+ZFS+RAIDZ+VirtualBox - ready for production systems?
I use a Intel Q9450 + P45 mobo + ATI 4850 + ZFS + VirtualBox. I have installed WinXP. It works good and is stable. There are features not implemented yet, though. For instance USB. I suggest you try VB yourself. It is ~20MB and installs quick. I used it on 1GB RAM P4 machine. It worked fine. If you have memory you can copy the install CD to /tmp. Installing from RAM is quite quick. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs status -v tries too hard?
After some errors were logged as to a problem with a ZFS file system, I ran zfs status followed by zfs status -v... # zpool status pool: ehome state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAMESTATE READ WRITE CKSUM ehome ONLINE 6.28K 2.84M 0 c2t0d0p0 ONLINE 6.28K 2.84M 0 errors: 796332 data errors, use '-v' for a list [ elided ] # zpool status -v pool: ehome state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAMESTATEREAD WRITE CKSUM ehomeONLINE 3.03K 2.09M0 c2t0d0p0 ONLINE 3.03K 2.09M0 HANGS HERE From another window do a truss of zfs status... ioctl(3, ZFS_IOC_ERROR_LOG, 0x08041DE0)Err#12 ENOMEM ioctl(3, ZFS_IOC_ERROR_LOG, 0x08041DE0)Err#12 ENOMEM ioctl(3, ZFS_IOC_ERROR_LOG, 0x08041DE0)Err#12 ENOMEM ioctl(3, ZFS_IOC_ERROR_LOG, 0x08041DE0)Err#12 ENOMEM ioctl(3, ZFS_IOC_ERROR_LOG, 0x08041DE0)Err#12 ENOMEM ioctl(3, ZFS_IOC_ERROR_LOG, 0x08041DE0)Err#12 ENOMEM One would think it would get the message After a reboot, a move of the drive to another UFS port on the laptop, a zfe export of ehome and a zfs import of ehome, it is back on line with zero errors ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool upgrade wrecked GRUB
Almost. I did exactly the same thing to my system -- upgrading ZFS. The 2008.11 development snapshot CD I found is based on snv_93 and doesn't yet suport ZFS v.11 so it refuses to import the pool. My system doesn't have a DVD drive, so I cannot boot the SXCE snv_94 DVD. I guess I have to track down or wait for a = snv_94 based development snapshot live CD. Should be any day now, right? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zpool import not working - I broke my pool...
Ross, Thanks, I have updated the bug with this info. Neil. Ross Smith wrote: Hmm... got a bit more information for you to add to that bug I think. Zpool import also doesn't work if you have mirrored log devices and either one of them is offline. I created two ramdisks with: # ramdiskadm -a rc-pool-zil-1 256m # ramdiskadm -a rc-pool-zil-2 256m And added them to the pool with: # zpool add rc-pool log mirror /dev/ramdisk/rc-pool-zil-1 /dev/ramdisk/rc-pool-zil-2 I can reboot fine, the pool imports ok without the ZIL and I have a script that recreates the ramdisks and adds them back to the pool: #!/sbin/sh state=$1 case $state in 'start') echo 'Starting Ramdisks' /usr/sbin/ramdiskadm -a rc-pool-zil-1 256m /usr/sbin/ramdiskadm -a rc-pool-zil-2 256m echo 'Attaching to ZFS ZIL' /usr/sbin/zpool replace test /dev/ramdisk/rc-pool-zil-1 /usr/sbin/zpool replace test /dev/ramdisk/rc-pool-zil-2 ;; 'stop') ;; esac However, if I export the pool, and delete one ramdisk to check that the mirroring works fine, the import fails: # zpool export rc-pool # ramdiskadm -d rc-pool-zil-1 # zpool import rc-pool cannot import 'rc-pool': one or more devices is currently unavailable Ross Date: Mon, 4 Aug 2008 10:42:43 -0600 From: [EMAIL PROTECTED] Subject: Re: [zfs-discuss] Zpool import not working - I broke my pool... To: [EMAIL PROTECTED]; [EMAIL PROTECTED] CC: zfs-discuss@opensolaris.org Richard Elling wrote: Ross wrote: I'm trying to import a pool I just exported but I can't, even -f doesn't help. Every time I try I'm getting an error: cannot import 'rc-pool': one or more devices is currently unavailable Now I suspect the reason it's not happy is that the pool used to have a ZIL :) Correct. What you want is CR 6707530, log device failure needs some work http://bugs.opensolaris.org/view_bug.do?bug_id=6707530 which Neil has been working on, scheduled for b96. Actually no. That CR mentioned the problem and talks about splitting out the bug, as it's really a separate problem. I've just done that and here's the new CR which probably won't be visible immediately to you: 6733267 Allow a pool to be imported with a missing slog Here's the Description: --- This CR is being broken out from 6707530 log device failure needs some work When Separate Intent logs (slogs) were designed they were given equal status in the pool device tree. This was because they can contain committed changes to the pool. So if one is missing it is assumed to be important to the integrity of the application(s) that wanted the data committed synchronously, and thus a pool cannot be imported with a missing slog. However, we do allow a pool to be missing a slog on boot up if it's in the /etc/zfs/zpool.cache file. So this sends a mixed message. We should allow a pool to be imported without a slog if -f is used and to not import without -f but perhaps with a better error message. It's the guidsum check that actually rejects imports with missing devices. We could have a separate guidsum for the main pool devices (non slog/cache). --- Find out how to make Messenger your very own TV! Try it Now! http://clk.atdmt.com/UKM/go/101719648/direct/01/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] more ZFS recovery
Tom Bird wrote: Hi, Have a problem with a ZFS on a single device, this device is 48 1T SATA drives presented as a 42T LUN via hardware RAID 6 on a SAS bus which had a ZFS on it as a single device. There was a problem with the SAS bus which caused various errors including the inevitable kernel panic, the thing came back up with 3 out of 4 zfs mounted. I've tried reading the partition table with format, works fine, also can dd the first 100G from the device quite happily so the communication issue appears resolved however the device just won't mount. Googling around I see that ZFS does have features designed to reduce the impact of corruption at a particular point, multiple meta data copies and so on, however commands to help me tidy up a zfs will only run once the thing has been mounted. You should also check the end of the LUN. ZFS stores its configuration data at the beginning and end of the LUN. An I/O error is a fairly generic error, but it can also be an indicator of a catastrophic condition. You should also check the system log in /var/adm/messages as well as any faults reported by fmdump. In general, ZFS can only repair conditions for which it owns data redundancy. In this case, ZFS does not own the redundancy function, so you are susceptible to faults of this sort. -- richard Would be grateful for any ideas, relevant output here: [EMAIL PROTECTED]:~# zpool import pool: content id: 14205780542041739352 state: FAULTED status: The pool metadata is corrupted. action: The pool cannot be imported due to damaged devices or data. The pool may be active on on another system, but can be imported using the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-72 config: content FAULTED corrupted data c2t9d0ONLINE [EMAIL PROTECTED]:~# zpool import content cannot import 'content': pool may be in use from other system use '-f' to import anyway [EMAIL PROTECTED]:~# zpool import -f content cannot import 'content': I/O error [EMAIL PROTECTED]:~# uname -a SunOS cs3.kw 5.10 Generic_127127-11 sun4v sparc SUNW,Sun-Fire-T200 Thanks ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS on 32bit.
Good afternoon, I have a ~600GB zpool living on older Xeons. The system has 8GB of RAM. The pool is hanging off two LSI Logic SAS3041X-Rs (no RAID configured). When I put a moderate amount of load on the zpool (like, say, copying many files locally, or deleting a large number of ZFS fs), the system hangs and becomes completely unresponsive, requiring a reboot. The ARC never gets over ~40MB. The system is running Sol10u4. Are there any suggested tunables for running big zpools on 32bit? Cheers. -- bda Cyberpunk is dead. Long live cyberpunk. http://mirrorshades.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Strange burstiness in write speed with a mirror
I've got a pool which I'm currently syncing a few hundred gigabytes to using rsync. The source machine is pretty slow, so it only goes at about 20 MB/s. Watching zpool iostat -v local-space 10, I see a pattern like this (trimmed to take up less space): capacity operationsbandwidth pool used avail read write read write -- - - - - - - local-space 251G 405G 0143 51 17.1M mirror 251G 405G 0143 51 17.1M c1d0s6 - - 0 0 0 0 c0d0s6 - - 0137 0 17.1M local-space 252G 404G 1163 2.55K 17.6M mirror 252G 404G 1163 2.55K 17.6M c1d0s6 - - 0145 6.39K 16.7M c0d0s6 - - 0150 38.4K 17.6M local-space 253G 403G 0159511 16.9M mirror 253G 403G 0159511 16.9M c1d0s6 - - 0340 0 41.0M c0d0s6 - - 0145 12.8K 16.9M local-space 253G 403G 0135511 16.2M mirror 253G 403G 0135511 16.2M c1d0s6 - - 0484 0 60.4M c0d0s6 - - 0130 0 16.2M local-space 253G 403G 0125 0 15.4M mirror 253G 403G 0125 0 15.4M c1d0s6 - - 0471 0 59.0M c0d0s6 - - 0123 0 15.4M local-space 253G 403G 0139 0 16.2M mirror 253G 403G 0139 0 16.2M c1d0s6 - - 0474 0 59.3M c0d0s6 - - 0129 0 16.2M local-space 253G 403G 0139 51 17.1M mirror 253G 403G 0139 51 17.1M c1d0s6 - - 0 3 6.39K 476K c0d0s6 - - 0137 0 17.1M local-space 253G 403G 0144 0 18.1M mirror 253G 403G 0144 0 18.1M c1d0s6 - - 0 0 0 0 c0d0s6 - - 0144 0 18.1M local-space 253G 403G 0146 0 18.1M mirror 253G 403G 0146 0 18.1M c1d0s6 - - 0 0 0 0 c0d0s6 - - 0144 0 18.1M local-space 253G 403G 0156 0 19.3M mirror 253G 403G 0156 0 19.3M c1d0s6 - - 0 0 0 0 c0d0s6 - - 0154 0 19.3M local-space 253G 403G 0152 0 19.1M mirror 253G 403G 0152 0 19.1M c1d0s6 - - 0 0 0 0 c0d0s6 - - 0152 0 19.1M local-space 253G 403G 0158 0 19.1M mirror 253G 403G 0158 0 19.1M c1d0s6 - - 0 0 0 0 c0d0s6 - - 0152 0 19.1M local-space 253G 403G 0150 0 18.5M mirror 253G 403G 0150 0 18.5M c1d0s6 - - 0 0 0 0 c0d0s6 - - 0147 0 18.5M local-space 253G 403G 0155 0 19.4M mirror 253G 403G 0155 0 19.4M c1d0s6 - - 0 0 0 0 c0d0s6 - - 0155 0 19.4M The interesting part of this (as far as I can tell) is the rightmost column; the write speeds of the second disk stay constant at about 20 MB/s, and the first disk fluctuates between zero and 60 MB/s. Is this normal behavior? Could it indicate a failing disk? There's nothing in `fmadm faulty', `dmesg', or `/var/adm/messages' that would indicate an impending disk failure, but this behavior is strange. I'm using rsync-3.0.1 (yes, security fix is on the way) on both ends, and no NFS involved in this. rsync is writing 256k blocks. `iostat -xl 2' shows a similar kind of fluctuation going on. Any suggestions what's going on? Any other diagnostics you'd like to see? I'd be happy to provide them. Thanks! Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on 32bit.
On Wed, Aug 6, 2008 at 13:31, Bryan Allen [EMAIL PROTECTED] wrote: I have a ~600GB zpool living on older Xeons. The system has 8GB of RAM. The pool is hanging off two LSI Logic SAS3041X-Rs (no RAID configured). You might try taking out 4gb of the ram (!). Some 32-bit drivers have problems doing DMA to 4GB, so limiting yourself to that much might at least eliminate that source of problems. Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] more ZFS recovery
re == Richard Elling [EMAIL PROTECTED] writes: tb == Tom Bird [EMAIL PROTECTED] writes: tb There was a problem with the SAS bus which caused various tb errors including the inevitable kernel panic, the thing came tb back up with 3 out of 4 zfs mounted. re In general, ZFS can only repair conditions for which it owns re data redundancy. If that's really the excuse for this situation, then ZFS is not ``always consistent on the disk'' for single-VDEV pools. There was no loss of data here, just an interruption in the connection to the target, like power loss or any other unplanned shutdown. Corruption in this scenario is is a significant regression w.r.t. UFS: http://mail.opensolaris.org/pipermail/zfs-discuss/2008-June/048375.html How about the scenario where you lose power suddenly, but only half of a mirrored VDEV is available when power is restored? Is ZFS vulnerable to this type of unfixable corruption in that scenario, too? pgpcFzodF6Qa6.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] more ZFS recovery
On Wed, Aug 6, 2008 at 13:57, Miles Nordin [EMAIL PROTECTED] wrote: re == Richard Elling [EMAIL PROTECTED] writes: tb == Tom Bird [EMAIL PROTECTED] writes: tb There was a problem with the SAS bus which caused various tb errors including the inevitable kernel panic, the thing came tb back up with 3 out of 4 zfs mounted. re In general, ZFS can only repair conditions for which it owns re data redundancy. If that's really the excuse for this situation, then ZFS is not ``always consistent on the disk'' for single-VDEV pools. Well, yes. If data is sent, but corruption somewhere (the SAS bus, apparently, here) causes bad data to be written, ZFS can generally detect but not fix that. It might be nice to have a verifywrites mode or something similar to make sure that good data has ended up on disk (at least at the time it checks), but failing that there's not much ZFS (or any filesystem) can do. Using a pool with some level of redundancy (mirroring, raidz) at least gives zfs a chance to read the missing pieces from the redundancy that it's kept. How about the scenario where you lose power suddenly, but only half of a mirrored VDEV is available when power is restored? Is ZFS vulnerable to this type of unfixable corruption in that scenario, too? Every filesystem is vulnerable to corruption, all the time. I'm willing to dispute any claims otherwise. Some are just more likely than others to hit their error conditions. I've personally run into UFS' problems more often than ZFS... but that doesn't mean I think I'm safe. Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] more ZFS recovery
Miles Nordin wrote: re == Richard Elling [EMAIL PROTECTED] writes: tb == Tom Bird [EMAIL PROTECTED] writes: tb There was a problem with the SAS bus which caused various tb errors including the inevitable kernel panic, the thing came tb back up with 3 out of 4 zfs mounted. re In general, ZFS can only repair conditions for which it owns re data redundancy. If that's really the excuse for this situation, then ZFS is not ``always consistent on the disk'' for single-VDEV pools. I disagree with your assessment. The on-disk format (any on-disk format) necessarily assumes no faults on the media. The difference between ZFS on-disk format and most other file systems is that the metadata will be consistent to some point in time because it is COW. With UFS, for instance, the metadata is overwritten, which is why it cannot be considered always consistent (and why fsck exists). There was no loss of data here, just an interruption in the connection to the target, like power loss or any other unplanned shutdown. Corruption in this scenario is is a significant regression w.r.t. UFS: I see no evidence that the data is or is not correct. What we know is that ZFS is attempting to read something and the device driver is returning EIO. Unfortunately, EIO is a catch-all error code, so more digging to find the root cause is needed. However, I will bet a steak dinner that if this device was mirrored to another, the pool will import just fine, with the affected device in a faulted or degraded state. http://mail.opensolaris.org/pipermail/zfs-discuss/2008-June/048375.html I have no idea what Eric is referring to, and it does not match my experience. Unfortunately, he didn't reference any CRs either :-(. Your baby is ugly posts aren't very useful. That said, we are constantly improving the resiliency of ZFS (more good stuff coming in b96), so it might be worth trying to recover with a later version. For example, boot SXCE b94 and try to import the pool. How about the scenario where you lose power suddenly, but only half of a mirrored VDEV is available when power is restored? Is ZFS vulnerable to this type of unfixable corruption in that scenario, too? No, this works just fine as long as one side works. But that is a very different case. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] more ZFS recovery
re == Richard Elling [EMAIL PROTECTED] writes: c If that's really the excuse for this situation, then ZFS is c not ``always consistent on the disk'' for single-VDEV pools. re I disagree with your assessment. The on-disk format (any re on-disk format) necessarily assumes no faults on the media. The media never failed, only the connection to the media. We've every good reason to believe that every CDB that the storage controller acknowledged as complete, was completed and is still there---and that is the only statement which must be true of unfaulty media. We've no strong reason to doubt it. re I see no evidence that the data is or is not correct. the ``evidence'' is that it was on a SAN, and the storage itself never failed, only the connection between ZFS and the storage. Remember: this device is 48 1T SATA drives presented as a 42T LUN via hardware RAID 6 on a SAS bus which had a ZFS on it as a single device. This sort of SAN-outage happens all the time, so it's not straining my belief to suggest that probably nothing else happened other than disruption of the connection between ZFS and the storage. It's not like a controller randomly ``acted up'' or something, so that I would suspect a bad disk. c http://mail.opensolaris.org/pipermail/zfs-discuss/2008-June/048375.html re I have no idea what Eric is referring to, and it does not re match my experience. unfortunately it's very easy to match the experience of ``nothing happened'' and hard to match the experience ``exactly the same thing happened to me.'' Have you been provoking ZFS in exactly the way Eric described, a single-vdev pool on FC where the FC SAN often has outages or where the storage is rebooted while ZFS is still running? If not, obviously it doesn't match your experience because you have none with this situation. OTOH if you've been doing that a lot, your not running into this problem means something. Otherwise, it's another case of the home-user defense: ``I can't tell you how close to zero the number of problems I've had with it is. It's so close to zero, it is zero, so there's virtually 0% chance what you're saying happened to you really did happen to you. and also to this other guy.'' When I say ``doesn't mathc my experience'' I meant I _do_ see Mac OS X pinwheels and for me it's ``usually'' traceable back to VM pressure or dead NFS server, not some random application-level userinterface modal-wait as others claimed: I'm selecting for the same situation you are, and gettin g a different result. that said, yeah, a CR would be nice. For such a serious problem, I'd like to think someone's collected an image of the corrupt filesystem and is trying to figure out wtf happened. I care about how safe is my data, not how pretty is your baby. I want its relative safety accurately represented based on the experience available to us. c How about the scenario where you lose power suddenly, but only c half of a mirrored VDEV is available when power is restored? c Is ZFS vulnerable to this type of unfixable corruption in that c scenario, too? re No, this works just fine as long as one side works. But that re is a very different case. -- richard Why do you regard this case as very different from a single vdev? I don't have confidence that it's clearly different w.r.t. whatever hypothetical bug Eric and Tom have run into. wm If data is sent, but corruption somewhere (the SAS bus, wm apparently, here) causes bad data to be written, ZFS can wm generally detect but not fix that. Why would there be bad data written? The SAS bus has checksums. The problem AIUI was that the bus went away, not that it started scribbling random data all over the place. Am I wrong? Remember what Tom's SAS bus is connected to. wm verifywrites The verification is the storage array returning success to the command it was issued. ZFS is supposed to, for example, delay returning from fsync() until this has happened. The same mechanism is used to write batches of things in a well-defined order to supposedly achieve the ``always-consistent''. It depends on the drive/array's ability to accurately report when data is committed to stable storage, not on rereading what was written, and this is the correct dependency because ZFS leaves write caches on, so the drive could satisfy a read from the small on-disk cache RAM even though that data would be lost if you pulled the disk's power cord. The system contains all the tools needed to keep the consistency promises even if you go around yanking SAS cables. And this is a data-loss issue, not just an availability issue like we were discussing before w.r.t. pulling drives. wm Every filesystem is vulnerable to corruption, all the time. Every filesystem in recent history makes rigorous guarantees about what will survive if you pull the connection to the disk array, or the host's power, at any time you wish. The
[zfs-discuss] zfs crash CR6727355 marked incomplete
A bug report I've submitted for a zfs-related kernel crash has been marked incomplete and I've been asked to provide more information. This CR has been marked as incomplete by User 1-5Q-2508 for the reason Need More Info. Please update the CR providing the information requested in the Evaluation and/or Comments field. However, when I pull up 6727355 in the bugs.opensolaris.org, it doesn't allow me to make any edits, nor do I see an evaluation or comments field - am I doing something wrong? -- Michael Hale[EMAIL PROTECTED] Manager of Engineering Support Enterprise Engineering Group Transcom Enhanced Services http://www.transcomus.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs crash CR6727355 marked incomplete
Michael Hale wrote: A bug report I've submitted for a zfs-related kernel crash has been marked incomplete and I've been asked to provide more information. This CR has been marked as incomplete by User 1-5Q-2508 for the reason Need More Info. Please update the CR providing the information requested in the Evaluation and/or Comments field. However, when I pull up 6727355 in the bugs.opensolaris.org, it doesn't allow me to make any edits, nor do I see an evaluation or comments field - am I doing something wrong? 1. The Comments field asks that the core dump be made readable by our zfs group, and the CR was made incomplete until the person who saved the core does this. 2. You do not see this because the Comments is not readable outside of Sun as it is used to contain customer information. 3. Finally there is no Evaluation yet. Bottom line is that you can ignore the Need more info - it wasn't directed at you. Sorry about the confusion. I guess the kinks in the system aren't ironed out yet. Usually if we need more info we will email you directly. Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Strange burstiness in write speed with a mirror
On Wed, 6 Aug 2008, Will Murnane wrote: I've got a pool which I'm currently syncing a few hundred gigabytes to using rsync. The source machine is pretty slow, so it only goes at about 20 MB/s. Watching zpool iostat -v local-space 10, I see a pattern like this (trimmed to take up less space): The pattern is indeed strange. You have a fairly low data rate load shared across a large number of mirrors. Maybe the I/Os complete very quickly and are often split within the 10 second duration, with no I/Os at all sometimes. If you change to 60 seconds or 120 seconds does the reported data rate even out? It looks like the mirrors are split across two controllers, which is very good, but could expose a performance issue with one of the controllers. Any suggestions what's going on? Any other diagnostics you'd like to see? I'd be happy to provide them. Search for Jeff Bonwick's diskqual.sh script posted earlier to this forum. It was posted on Mon, 14 Apr 2008 15:49:41 -0700. It is quite handy for seeing if your disks are performing at a reasonably uniform rate. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on 32bit.
For what it's worth I see this as well on 32-bit Xeons, 1GB ram, and dual AOC-SAT2-MV8 (large amounts of io sometimes resulting in lockup requiring a reboot --- though my setup is Nexenta b85). Nothing in the logging, nor loadavg increasing significantly. It could be the regular Marvell driver issues, but is definitely not cool when it happens. Thomas On Wed, Aug 6, 2008 at 1:31 PM, Bryan Allen [EMAIL PROTECTED] wrote: Good afternoon, I have a ~600GB zpool living on older Xeons. The system has 8GB of RAM. The pool is hanging off two LSI Logic SAS3041X-Rs (no RAID configured). When I put a moderate amount of load on the zpool (like, say, copying many files locally, or deleting a large number of ZFS fs), the system hangs and becomes completely unresponsive, requiring a reboot. The ARC never gets over ~40MB. The system is running Sol10u4. Are there any suggested tunables for running big zpools on 32bit? Cheers. -- bda Cyberpunk is dead. Long live cyberpunk. http://mirrorshades.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on 32bit.
In the most recent code base (both OpenSolaris/Nevada and S10Ux with patches) all the known marvell88sx problems have long ago been dealt with. However, I've said this before. Solaris on 32-bit platforms has problems and is not to be trusted. There are far, far too many places in the source code where a 64-bit object is either loaded or stored without any atomic locking occurring which could result in any number of wrong and bad behaviors. ZFS has some problems of this sort, but so does some of the low level 32-bit x86 code. The problem was reported long ago, but to the best of my knowledge the issues have not been addressed. Looking below it appears that nothing has been done for about 9 months. Here is the top of the bug report: Bug ID 6634371 SynopsisSolaris ON is broken w.r.t. 64-bit operations on 32-bit processors State 1-Dispatched (Default State) Category:Subcategorykernel:other Keywords32-bit | 64-bit | atomic Reported Against Duplicate Of Introduced In Commit to Fix Fixed In Release Fixed Related Bugs Submit Date 27-NOV-2007 Last Update Date28-NOV-2007 This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on 32bit.
Brian D. Horn wrote: In the most recent code base (both OpenSolaris/Nevada and S10Ux with patches) all the known marvell88sx problems have long ago been dealt with. However, I've said this before. Solaris on 32-bit platforms has problems and is not to be trusted. There are far, far too many places in the source code where a 64-bit object is either loaded or stored without any atomic locking occurring which could result in any number of wrong and bad behaviors. ZFS has some problems of this sort, but so does some of the low level 32-bit x86 code. The problem was reported long ago, but to the best of my knowledge the issues have not been addressed. Looking below it appears that nothing has been done for about 9 months. Here is the top of the bug report: Bug ID 6634371 Synopsis Solaris ON is broken w.r.t. 64-bit operations on 32-bit processors State 1-Dispatched (Default State) Category:Subcategory kernel:other I believe you misfiled that bug. I've redirected it to solaris / kernel / arch-x86 which appears to me to be more appropriate. James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] OpenSolaris+ZFS+RAIDZ+VirtualBox - ready for production systems?
Oh, I have 'played' with them all: VirtualBox, VMware, KVM... But now I need to set up a production system for various Linux Windows guests. And none of the 3 mentioned are 100% perfect, so the choice is difficult... My first choice would be KVM+RAIDZ, but since KVM only works on Linux, and RAIDZ doesn't work all that well yet on Linux, this is not an option... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs-auto-snapshot 0.11 work (was Re: zfs-auto-snapshot with at scheduling )
The other changes that will appear in 0.11 (which is nearly done) are: Still looking forward to seeing .11 :) Think we can expect a release soon? (or at least svn access so that others can check out the trunk?) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] more ZFS recovery
On Wed, Aug 6, 2008 at 8:20 AM, Tom Bird [EMAIL PROTECTED] wrote: Hi, Have a problem with a ZFS on a single device, this device is 48 1T SATA drives presented as a 42T LUN via hardware RAID 6 on a SAS bus which had a ZFS on it as a single device. There was a problem with the SAS bus which caused various errors including the inevitable kernel panic, the thing came back up with 3 out of 4 zfs mounted. Hi Tom, After reading this and the followups to date this could be due to anything ... and we (on the list) don't know the history of the system or the RAID device. You could have a bad SAS controller, bad system memory, a bad cable or a RAID controller with a firmware bug The first step would be to form a ZFS pool with 2 mirrors, beat up on it and gain some confidence in the overall system components. Write lots of data to it, run zpool scrub etc. and verify that it's 100% rock solid before you then zpool destroy it and then test with a larger pool. In every case where someone has initially posted an opening story list yours, the problem has almost always turned out to be outside of ZFS. As others have explained, if ZFS does not have a config with data redundancy - there is not much that can be learned - except that it just broke. Keep testing and report back. Also, any additional data on the hardware and software config would be useful and let us know if this is a new system or if the hardware has already been in service and its reliability track record. I've tried reading the partition table with format, works fine, also can dd the first 100G from the device quite happily so the communication issue appears resolved however the device just won't mount. Googling around I see that ZFS does have features designed to reduce the impact of corruption at a particular point, multiple meta data copies and so on, however commands to help me tidy up a zfs will only run once the thing has been mounted. Would be grateful for any ideas, relevant output here: [EMAIL PROTECTED]:~# zpool import pool: content id: 14205780542041739352 state: FAULTED status: The pool metadata is corrupted. action: The pool cannot be imported due to damaged devices or data. The pool may be active on on another system, but can be imported using the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-72 config: content FAULTED corrupted data c2t9d0ONLINE [EMAIL PROTECTED]:~# zpool import content cannot import 'content': pool may be in use from other system use '-f' to import anyway [EMAIL PROTECTED]:~# zpool import -f content cannot import 'content': I/O error [EMAIL PROTECTED]:~# uname -a SunOS cs3.kw 5.10 Generic_127127-11 sun4v sparc SUNW,Sun-Fire-T200 Thanks -- Tom Regards, -- Al Hopper Logical Approach Inc,Plano,TX [EMAIL PROTECTED] Voice: 972.379.2133 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on 32bit.
Brian D. Horn wrote: In the most recent code base (both OpenSolaris/Nevada and S10Ux with patches) all the known marvell88sx problems have long ago been dealt with. Not true. The working marvell patches still have not been released for Solaris. They're still just IDRs. Unless you know something I (and my Sun support reps) don't, in which case please provide patch numbers. -- Carson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on 32bit.
As far as I can tell from the patch web patches: For Solaris 10 x86 138053-01 should have the fixes it does depend on other earlier patches though). I find it very difficult to tell what the story is with patches as the patch numbers seem to have very little in them to correlate them to code changes. For Solaris Nevada/OpenSolaris it would seem that the fixes when back Feb 11, 2008 (though there have been additional changes to the sata module since then). Pretty much if you have a version of the driver that still spews informational messages with marvell88sx in them, you are running old stuff. If those messages have been suppressed, odds are that you have new stuff. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on 32bit.
On Wed, Aug 6, 2008 at 6:22 PM, Carson Gaspar [EMAIL PROTECTED] wrote: Brian D. Horn wrote: In the most recent code base (both OpenSolaris/Nevada and S10Ux with patches) all the known marvell88sx problems have long ago been dealt with. Not true. The working marvell patches still have not been released for Solaris. They're still just IDRs. Unless you know something I (and my Sun support reps) don't, in which case please provide patch numbers. I was able to get a Tpatch this week with encouraging words about a likely release of 138053-02 this week. In a separate thread last week (?) Enda said that it should be out within a couple weeks. Mike -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on 32bit.
On Thu, Aug 7, 2008 at 5:32 AM, Peter Bortas [EMAIL PROTECTED] wrote: On Wed, Aug 6, 2008 at 7:31 PM, Bryan Allen [EMAIL PROTECTED] wrote: Good afternoon, I have a ~600GB zpool living on older Xeons. The system has 8GB of RAM. The pool is hanging off two LSI Logic SAS3041X-Rs (no RAID configured). When I put a moderate amount of load on the zpool (like, say, copying many files locally, or deleting a large number of ZFS fs), the system hangs and becomes completely unresponsive, requiring a reboot. I have the same problem with 32bit, 2GiB RAM and 6 disk in a 2.7T raidz on snv_81. Slightly unbalanced one might say, but it shouldn't lock up regardless. Forgot to mention I run with diffrent controllers: 2 x Sil3114 PCI cards. -- Peter Bortas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on 32bit.
Bryan, Thomas: these hangs of 32-bit Solaris under heavy (fs, I/O) loads are a well known problem. They are caused by memory contention in the kernel heap. Check 'kstat vmem::heap'. The usual recommendation is to change the kernelbase. It worked for me. See: http://mail.opensolaris.org/pipermail/zfs-discuss/2008-March/046710.html http://mail.opensolaris.org/pipermail/zfs-discuss/2008-March/046715.html -marc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] more ZFS recovery
From the ZFS Administration Guide, Chapter 11, Data Repair section: Given that the fsck utility is designed to repair known pathologies specific to individual file systems, writing such a utility for a file system with no known pathologies is impossible. That's a fallacy (and is incorrect even for the UFS fsck; refer to the McKusick/Kowalski paper and the distinction they make between 'expected' corruptions and other inconsistencies). First, there are two types of utilities which might be useful in the situation where a ZFS pool has become corrupted. The first is a file system checking utility (call it zfsck); the second is a data recovery utility. The difference between those is that the first tries to bring the pool (or file system) back to a usable state, while the second simply tries to recover the files to a new location. What does a file system check do? It verifies that a file system is internally consistent, and makes it consistent if it is not. If ZFS were always consistent on disk, then only a verification would be needed. Since we have evidence that it is not always consistent in the face of hardware failures, at least, repair may also be needed. This doesn't need to be that hard. For instance, the space maps can be reconstructed by walking the various block trees; the uberblock effectively has several backups (though it might be better in some cases if an older backup were retained); and the ZFS checksums make it easy to identify block types and detect bad pointers. Files can be marked as damaged if they contain pointers to bad data; directories can be repaired if their hash structures are damaged (as long as the names and pointers can be salvaged); etc. Much more complex file systems than ZFS have file system checking utilities, because journaling, COW, etc. don't help you in the face of software bugs or certain classes of hardware failures. A recovery tool is even simpler, because all it needs to do is find a tree root and then walk the file system, discovering directories and files, verifying that each of them is readable by using the checksums to check intermediate and leaf blocks, and extracting the data. The tricky bit with ZFS is simply identifying a relatively new root, so that the newest copy of the data can be identified. Almost every file system starts out without an fsck utility, and implements one once it becomes obvious that sorry, you have to reinitialize the file system -- or worse, sorry, we lost all of your data -- is unacceptable to a certain proportion of customers. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] more ZFS recovery
As others have explained, if ZFS does not have a config with data redundancy - there is not much that can be learned - except that it just broke. Plenty can be learned by just looking at the pool. Unfortunately ZFS currently doesn't have tools which make that easy; as I understand it, zdb doesn't work (in a useful way) on a pool which won't import, so dumping out the raw data structures and looking at them by hand is the only way to determine what ZFS doesn't like and deduce what went wrong (and how to fix it). This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] more ZFS recovery
On Wed, Aug 06, 2008 at 02:23:44PM -0400, Will Murnane wrote: On Wed, Aug 6, 2008 at 13:57, Miles Nordin [EMAIL PROTECTED] wrote: If that's really the excuse for this situation, then ZFS is not ``always consistent on the disk'' for single-VDEV pools. Well, yes. If data is sent, but corruption somewhere (the SAS bus, apparently, here) causes bad data to be written, ZFS can generally detect but not fix that. It might be nice to have a verifywrites mode or something similar to make sure that good data has ended up on disk (at least at the time it checks), but failing that there's not much ZFS (or any filesystem) can do. Using a pool with some level of redundancy (mirroring, raidz) at least gives zfs a chance to read the missing pieces from the redundancy that it's kept. There's also ditto blocks. So even on a one vdev pool you ZFS can recover from random corruption unless you're really unlucky. Of course, this is a feature. Without ZFS the OP would have had silent, undetected (by the OS that is) data corruption. Basically you don't want to have one-vdev pools. If you'll use HW RAID then you should also do mirroring at the ZFS layer. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] more ZFS recovery
On Wed, Aug 06, 2008 at 03:44:08PM -0400, Miles Nordin wrote: re == Richard Elling [EMAIL PROTECTED] writes: c If that's really the excuse for this situation, then ZFS is c not ``always consistent on the disk'' for single-VDEV pools. re I disagree with your assessment. The on-disk format (any re on-disk format) necessarily assumes no faults on the media. The media never failed, only the connection to the media. We've every good reason to believe that every CDB that the storage controller acknowledged as complete, was completed and is still there---and that is the only statement which must be true of unfaulty media. We've no strong reason to doubt it. zdb should be able to pinpoint the problem, no? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on 32bit.
Yes, there have been bugs with heavy I/O and ZFS running the system out of memory. However, there was a contention in the thread about it possibly being due to marvell88sx driver bugs (most likely not). Further, my mention of 32-bit Solaris being unsafe at any speed is still true. Without analysis of a specific hang it is very hard to say what caused it. It could be driver, memory exhaustion, file system error, VM error, broken hardware, or any number of other things. My points were 1) The marvell88sx driver should be pretty solid at this point in time (yes, earlier releases had problems, most of which were related to bad block handling), and 2) There are systemic issues in Solaris on 32-bit architectures (of which only x86 is supported). This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool upgrade wrecked GRUB
so finally, I gathered up some courage and installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c2d0s0 seemed to write out what I assume is a new MBR. Not the MBR - the stage1 and 2 files are written to the boot area of the Solaris FDISK partition. tried to also installgrub on the other disk in the mirror c3d0 and failed over several permuationscannot open/stat /dev/rdsk/c3d0s2 was the error msg. This is because installgrub needs the overlap slice to be present as slice 2 for some reason. The overlap slice, also called the backup slice, covers the whole of the Solaris FDISK partition. If you don't have one on your second disk, just create one. however a reboot from dsk/c2dos0 gave me a healthy and unchanged grub stage2 menu and functioning system again . whew Although I cannot prove causality here, I still think that the zpool upgrade ver.10 - ver.11 borked the MBR. indeed, probably the stage2 sectors, i guess. No - upgrading a ZFS pool doesn't touch the MBR or the stage2. The problem is that the grub ZFS filesystem reader needs updated to understand the version 11 pool. This doesn't (yet) happen automatically. I also seem to also only have single MBR between the two disks in the mirror. is this normal? Not really normal, but at present manually creating a ZFS boot mirror in this way does not set the 2nd disk up correctly, as you've discovered. To write a new Solaris grub MBR to the second disk, do this: installgrub -m /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c3d0s0 The -m flag tells installgrub to put the grub stage1 into the MBR. Cheers Andrew. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] more ZFS recovery
re == Richard Elling [EMAIL PROTECTED] writes: re If your pool is not redundant, the chance that data re corruption can render some or all of your data inaccessible is re always present. 1. data corruption != unclean shutdown 2. other filesystems do not need a mirror to recover from unclean shutdown. They only need it for when disks fail, or for when disks misremember their contents (silent corruption, as in NetApp paper). I would call data corruption and silent corruption the same thing: what the CKSUM column was _supposed_ to count, though not in fact the only thing it counts. 3. saying ZFS needs a mirror to recover from unclean shutdown does not agree with the claim ``always consistent on the disk'' 4. I'm not sure exactly your position. Before you were saying what Erik warned about doesn't happen, because there's no CR, and Tom must be confused too. Now you're saying of course it happens, ZFS's claims of ``always consistent on disk'' count for nothing unless you have pool redundancy. And that is exactly what I said to start with: re In general, ZFS can only repair conditions for which it owns re data redundancy. c If that's really the excuse for this situation, then ZFS is c not ``always consistent on the disk'' for single-VDEV pools. that is the take-home message? If so, it still leaves me with the concern, what if the breaking of one component in a mirrored vdev takes my system down uncleanly? This seems like a really plausible failure mode (as Tom said, ``the inevitable kernel panic''). In that case, I no longer have any redundancy when the system boots back up. If ZFS calls the inconsistent states through which it apparently sometimes transitions pools ``data corruption'' and depends on redundancy to recover from them, then isn't it extremely dangerous to remove power or SAN connectivity from any DEGRADED pool? The pool should be rebuilt onto a hot spare IMMEDIATELY so that it's ONLINE as soon as possible, because if ZFS loses power with a DEGRADED pool all bets are off. If this DEGRADED-pool unclean shutdown is, as you say, a completely different scenario from single-vdev pools that isn't dangerous and has no trouble with ZFS corruption, then no one should ever run a single-vdev pool. We should instead run mirrored vdevs that are always DEGRADED, since this configuration looks identical to everything outside ZFS but supposedly magically avoids the issue. If only we had some way to attach to vdevs fake mirror components that immediately get marked FAULTED then we can avoid the corruption risk. But, that's clearly absurd! so, let's say ZFS's requirement is, as we seem to be describing it: might lose the whole pool if your kernel panics or you pull the power cord in a situation without redundancy. Then I think this is an extremely serious issue, even for redundant pools. It is very plausible that a machine will panic or lose power during a resilver. And if, on the other hand, ZFS doesn't transition disks through inconsistent states and then excuse itself calling what it did ``data corruption'' when it bites you after an unclean shutdown, then what happened to Erik and Tom? It seems to me it is ZFS's fault and can't be punted off to the administrator's ``asking for it.'' pgpX6I9cpJdn1.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] more ZFS recovery
nw == Nicolas Williams [EMAIL PROTECTED] writes: nw Without ZFS the OP would have had silent, undetected (by the nw OS that is) data corruption. It sounds to me more like the system would have paniced as soon as he pulled the cord, and when it rebooted, it would have rolled the UFS log and mounted, without even an fsck, with no corruption at all, silent or otherwise. Note that the storage controller never even lost power, and does not appear to be faulty. pgpOySYkUnQ3V.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss