[zfs-discuss] Re: Proposal: multiple copies of user data
Is this true for single-sector, vs. single-ZFS-block, errors? (Yes, it's pathological and probably nobody really cares.) I didn't see anything in the code which falls back on single-sector reads. (It's slightly annoying that the interface to the block device drivers loses the SCSI error status, which tells you the first sector which was bad.) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Proposal: multiple copies of user data
> Matthew Ahrens wrote: > > Here is a proposal for a new 'copies' property > which would allow > > different levels of replication for different > filesystems. > > Thanks everyone for your input. > > The problem that this feature attempts to address is > when you have some > data that is more important (and thus needs a higher > level of > redundancy) than other data. Of course in some > situations you can use > multiple pools, but that is antithetical to ZFS's > pooled storage model. > (You have to divide up your storage, you'll end up > with stranded > torage and bandwidth, etc.) > > Given the overwhelming criticism of this feature, I'm > going to shelve it > for now. Damn! That's a real shame! I was really starting to look forward to that. Please reconsider??! > --matt > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss > Celso This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Proposal: multiple copies of user data
On 12/09/06, Celso <[EMAIL PROTECTED]> wrote: One of the great things about zfs, is that it protects not just against mechanical failure, but against silent data corruption. Having this available to laptop owners seems to me to be important to making zfs even more attractive. I'm not arguing against that. I was just saying that *if* this was useful to you (and you were happy with the dubious resilience/performance benefits) you can already create mirrors/raidz on a single disk by using partitions as building blocks. There's no need to implement the proposal to gain that. Am I correct in assuming that having say 2 copies of your "documents" filesystem means should silent data corruption occur, your data can be reconstructed. So that you can leave your os and base applications with 1 copy, but your important data can be protected. Yes. -- Rasputin :: Jack of All Trades - Master of Nuns http://number9.hellooperator.net/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Proposal: multiple copies of user data
Take this for what it is: the opinion on someone who knows less about zfs than probably anyone else on this thread ,but... I would like to add my support for this proposal. As I understand it, the reason for using ditto blocks on metadata, is that maintaining their integrity is vital for the health of the filesystem, even if the zpool isn't mirrored or redundant in any way ie laptops, or people who just don't or can't add another drive. One of the great things about zfs, is that it protects not just against mechanical failure, but against silent data corruption. Having this available to laptop owners seems to me to be important to making zfs even more attractive. Granted, if you are running a enterprise based fileserver, this probably isn't going to be your first choice for data protection. You will probably be using the other features of zfs like mirroring, raidz raidz2 etc. Am I correct in assuming that having say 2 copies of your "documents" filesystem means should silent data corruption occur, your data can be reconstructed. So that you can leave your os and base applications with 1 copy, but your important data can be protected. In a way, this reminds me of intel's "matrix raid" but much cooler (it doesn't rely on a specific motherboard for one thing). I would also agree that utilities like 'ls' and quotas should report both and count against peoples quotas. It just doesn't seem to hard to me to understand that because you have 2 copies, you halve the amount of available space. Just to reiterate, I think this would be an awesome feature! Celso. PS. Please feel free to correct me on any technical inaccuracies. I am trying to learn about zfs and Solaris 10 in general. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Proposal: multiple copies of user data
On Tue, 12 Sep 2006, Anton B. Rang wrote: reformatted > >True - I'm a laptop user myself. But as I said, I'd assume the whole disk > >would fail (it does in my experience). Usually a laptop disk suffers a mechanical failure - and the failure rate is a lot higher than disks in a fixed location environment. > That's usually the case, but single-block failures can occur as well. > They're rare (check the "uncorrectable bit error rate" specifications) > but if they happen to hit a critical file, they're painful. > > On the other hand, multiple copies seems (to me) like a really expensive > way to deal with this. ZFS is already using relatively large blocks, so > it could add an erasure code on top of them and have far less storage > overhead. If the assumed problem is multi-block failures in one area of > the disk, I'd wonder how common this failure mode is; in my experience, > multi-block failures are generally due to the head having touched the > platter, in which case the whole drive will shortly fail. (In any case, The following is based on dated knowledge from personal experience and I can't say if its (still) accurate information today. Drive failures in a localized area are generally caused by the heads being positioned in the same (general) cylinder position for long periods of time. The heads ride on a air bearing - but there is still a lot of friction caused by the movement of air under the heads. This is turn generates heat. Localized heat buildup can cause some of the material coated on the disk to break free. The drive is designed for this eventuality - since it is equipped with a very fine filter which will catch and trap anything that breaks free and the airflow is designed to constantly circulate the air through the filter. However, some of the material might get trapped between the head and the disk and possibly stick to the disk. In this case, the neighbouring disk cylinders in this general area will probably be damaged and, if enough material accumulates, so might the head(s). In the old days people wrote their own head "floater" programs - to ensure that the head was moved randomly across the disk surface from time to time. I don't know if this is still relevant today - since the amount of firmware a disk drive executes, continues to increase every day. But in a typical usage scenario, where a user does, for example, a find operation in a home directory - and the directory caches are not sized large enough, there is a good probability that the heads will end up in the same general area of the disk, after the find op completes. Assuming that the box has enough memory, the disk may not be accessed again for a long time - and possibly only during another find op (wash, rinse, repeat). Continuing: a buildup of heat in a localized cylinder area, will cause the disk platter to expand and shift, relative to the heads. The disk platter has one surface dedicated to storing servo information - and from this the disk can "decide" that it is on the wrong cylinder after a head movement. In which case the drive will recalibrate itself (thermal recalibration) and store a table of offsets for different cylinder ranges. So when the head it told, for example, to move to cylinder 1000, the correction table will tell it to move to where physical cylinder 1000 should be and then add the correction delta (plus or minus) for that cylinder range to figure out where to the actually move the heads to. Now the heads are positioned on the correct cylinder and should be centered on it. If the drive gets a bad CRC after reading a cylinder it can use the CRC to correct the data or it can command that the data be re-read, until a correctable read is obtained. Last I heard, the number of retries is of the order of 100 to 200 or more(??). So this will be noticable - since 100 reads will require 100 revolutions of the disk. Retries like this will probably continue to provide correctable data to the user and the disk drive will ignore the fact that there is an area of disk where retries are constantly required. This is what Steve Gibson picked up on for his SpinRite product. If he runs code that can determine that CRC corrections or re-reads are required to retrieve good data, then he "knows" this is a likely area of the disk to fail in the (possibly near) future. So he relocates the data in this area, marks the area "bad", and the drive avoids it. Given what I wrote earlier, that there could be some physical damage in this general area - having the heads avoid it is a Good Thing. So the question is, how relevant is storing multiple copies of data on a disk in terms of the mechanics of modern disk drive failure modes. Without some "SpinRite" like functionality in the code, the drive will continue to access the deteriorating disk cylinders, now a localized failure, and eventually it will deteriorate further and cause enough material to break free to take out the head(s). At which time the
[zfs-discuss] Re: Proposal: multiple copies of user data
>True - I'm a laptop user myself. But as I said, I'd assume the whole disk >would fail (it does in my experience). That's usually the case, but single-block failures can occur as well. They're rare (check the "uncorrectable bit error rate" specifications) but if they happen to hit a critical file, they're painful. On the other hand, multiple copies seems (to me) like a really expensive way to deal with this. ZFS is already using relatively large blocks, so it could add an erasure code on top of them and have far less storage overhead. If the assumed problem is multi-block failures in one area of the disk, I'd wonder how common this failure mode is; in my experience, multi-block failures are generally due to the head having touched the platter, in which case the whole drive will shortly fail. (In any case, multi-block failures could be addressed by spreading the data from a large block and using an erasure code.) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Proposal: multiple copies of user data
Anton B. Rang wrote: The biggest problem I see with this is one of observability, if not all of the data is encrypted yet what should the encryption property say ? If it says encryption is on then the admin might think the data is "safe", but if it says it is off that isn't the truth either because some of it maybe still encrypted. From a user interface perspective, I'd expect something like Encryption: Being enabled, 75% complete or Encryption: Being disabled, 25% complete, about 2h23m remaining and if we are still writing to the file systems at that time ? Maybe this really does need to be done with the file system locked. I'm not sure how you'd map this into a property (or several), but it seems like "on"/"off" ought to be paired with "transitioning to on"/"transitioning to off" for any changes which aren't instantaneous. Agreed, and checksum and compression would have the same issue if there was a mechanism to rewrite with the new checksums or compression settings. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Proposal: multiple copies of user data
>The biggest problem I see with this is one of observability, if not all >of the data is encrypted yet what should the encryption property say ? >If it says encryption is on then the admin might think the data is >"safe", but if it says it is off that isn't the truth either because >some of it maybe still encrypted. >From a user interface perspective, I'd expect something like Encryption: Being enabled, 75% complete or Encryption: Being disabled, 25% complete, about 2h23m remaining I'm not sure how you'd map this into a property (or several), but it seems like "on"/"off" ought to be paired with "transitioning to on"/"transitioning to off" for any changes which aren't instantaneous. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Proposal: multiple copies of user data
> Hi Matt, > Interesting proposal. Has there been any > consideration if free space being reported for a ZFS > filesystem would take into account the copies > setting? > > Example: > zfs create mypool/nonredundant_data > zfs create mypool/redundant_data > df -h /mypool/nonredundant_data > /mypool/redundant_data >(shows same amount of free space) > zfs set copies=3 mypool/redundant_data > > Would a new df of /mypool/redundant_data now show a > different amount of free space (presumably 1/3 if > different) than /mypool/nonredundant_data? As I understand the proposal, there's nothing new to do here. The filesystem might be 25% full, and it would be 25% full no matter how many copies of the filesystem there are. Similarly with quotas, I'd argue that the extra copies should not count towards a user's quota, since a quota is set on the filesystem. If I'm using 500M on a filesystem, I only have 500M of data no matter how many copies of it the administrator has decided to keep (cf. RAID1). I also don't see why a copy can't just be dropped if the "copies" value is decreased. Having said this, I don't see any value in the proposal at all, to be honest. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Proposal: multiple copies of user data
Hi Matt, Interesting proposal. Has there been any consideration if free space being reported for a ZFS filesystem would take into account the copies setting? Example: zfs create mypool/nonredundant_data zfs create mypool/redundant_data df -h /mypool/nonredundant_data /mypool/redundant_data (shows same amount of free space) zfs set copies=3 mypool/redundant_data Would a new df of /mypool/redundant_data now show a different amount of free space (presumably 1/3 if different) than /mypool/nonredundant_data? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss