On Tue, 16 Feb 2010, Christo Kutrovsky wrote:

Just finished reading the following excellent post:

http://queue.acm.org/detail.cfm?id=1670144

A nice article, even if I don't agree with all of its surmises and conclusions. :-)

In fact, I would reach a different conclusion.

I considered something like simply do a 2way mirror. What are the chances for a very specific drive to fail in 2 way mirror? What if I do not want to take that chance?

The probability of whole drive failure, or individual sector failure, has not increased over the years. The probability of individual sector failure has diminished substantially over the years. The probability of losing a whole mirror pair has gone down since the probability of individual drive failure has gone down.

I could always put "copies=2" (or more) to my important datasets and take some risk and tolerate such a failure.

I don't believe that "copies=2" buys much at all when using mirror disks (or raidz). It assumes that there is a concurrency of simultaneous media failure, which is actually quite rare indeed. The "copies=2" setting only buys something when there is no other redundancy available.

One of the ideas that sparkled is have a "max devices" property for each data set, and limit how many mirrored devices a given data set can be spread on. I mean if you don't need the performance, you can limit (minimize) the device, should your capacity allow this.

What you seem to be suggesting is a sort of targeted heirarchical vdev without extra RAID.

Remember. The goal is damage control. I know 2x raidz2 offers better protection for more capacity (altought less performance, but that's no the point).

It seems that Adam Leventhal's excellent paper reaches the wrong conclusions because it assumes that history is a predictor for the future. However, history is a rather poor predictor in this case. Imagine if 9" floppies had increased their density to support 20GB each (up from 160KB), but that did not happen, and now we don't use floppies at all. We already see many cases where history was no longer a good predictor of the future, and (as an example) increased integration has brought us multi-core CPUs rather than 20GHz CPUs.

My own conclusions (supported by Adam Leventhal's excellent paper) are that

 - maximum device size should be constrained based on its time to
   resilver.

 - devices are growing too large and it is about time to transition to
   the next smaller physical size.

It is unreasonable to spend more than 24 hours to resilver a single drive. It is unreasonable to spend more than 6 days resilvering all of the devices in a RAID group (the 7th day is reserved for the system administrator). It is unreasonable to spend very much time at all on resilvering (using current rotating media) since the resilvering process kills performance.

When looking at the possibility of data failure it is wise to consider physical issues such as

 - shared power supply

 - shared chassis

 - shared physical location

 - shared OS kernel or firmware instance

all of which are very bad for data reliability since a problem with anything shared can lead to destruction of all copies of the data.

In New York City, all of the apartment doors seem to be fitted with three deadlocks, all of which lock into the same flimsy splintered door frame. It is important to consider each significant system weakness in turn in order to achieve the least chance of loss, while providing the best service.

Bob

P.S. NASA is tracking large asteroids and meteors with the hope that they will eventually be able to deflect any which will strike our planet in order to in an effort to save your precious data.
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to