On Thu, Apr 08, 2010 at 08:36:43PM -0700, Richard Elling wrote: > On Apr 8, 2010, at 6:19 PM, Daniel Carosone wrote: > > > > As for error rates, this is something zfs should not be afraid > > of. Indeed, many of us would be happy to get drives with less internal > > ECC overhead and complexity for greater capacity, and tolerate the > > resultant higher error rates, specifically for use with zfs (sector > > errors, not overall drive failure, of course). Even if it means I > > need raidz4, and wind up with the same overall usable space, I may > > prefer the redundancy across drives rather than within. > > Disagree. Reliability trumps availability every time.
Often, but not sure about every. The economics shift around too fast for such truisms to be reliable, and there's always room for an upstart (often in a niche) to make great economic advantages out of questioning this established wisdom. The oft-touted example is google's servers, but there are many others. > And the problem > with the availability provided by redundancy techniques is that the > amount of work needed to recover is increasing. This work is limited > by latency and HDDs are not winning any latency competitions anymore. We're talking about generalities; the niche can be very important to enable these kinds of tricks by holding some of the other troubling variables constant (e.g. application/programming platform). It doesn't really matter whether you're talking about 1 dual-PSU server vs 2 single-PSU servers, or whole datacentres - except that solid large-scale diversity tends to lessen your concentration (and perhaps spend) on internal redundancy within a datacentre (or disk). Put another way: some application niches are much more able to adopt redundancy techniques that don't require so much work. Again, for the google example: if you're big and diverse enough that shifting load between data centres on failure is no work, then moving the load for other reasons is viable too - such as moving to where it's night time and power and cooling are cheaper. The work has been done once, up front, and the benefits are repeatable. > To combat this, some vendors are moving to an overprovision model. > Current products deliver multiple "disks" in a single FRU with builtin, > fine-grained redundancy. Because the size and scope of the FRU is > bounded, the recovery can be optimized and the reliability of the FRU > is increased. That's not new. Past examplees in the direct experience of this community include the BladeStor and SSA-1000 storage units, which aggregated disks into failure domains (e.g. drawers) for a (big) density win. -- Dan.
pgpPTNvdAEWVY.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss