On Sep 28, 2009, at 11:41 AM, Bob Friesenhahn wrote:

On Mon, 28 Sep 2009, Richard Elling wrote:

In other words, I am concerned that people replace good data protection practices with scrubs and expecting scrub to deliver better data protection
           (it won't).

Many people here would profoundly disagree with the above. There is no substitute for good backups, but a periodic scrub helps validate that a later resilver would succeed. A perioic scrub also helps find system problems early when they are less likely to crater your business. It is much better to find an issue during a scrub rather than during resilver of a mirror or raidz.

As I said, I am concerned that people would mistakenly expect that scrubbing
offers data protection. It doesn't.  I think you proved my point? ;-)

Scrubs are also useful for detecting broken hardware. However, normal activity will also detect broken hardware, so it is better to think of scrubs as finding degradation of old data rather than being a hardware checking service.

Do you have a scientific reference for this notion that "old data" is more likely to be corrupt than "new data" or is it just a gut- feeling? This hypothesis does not sound very supportable to me. Magnetic hysteresis lasts quite a lot longer than the recommended service life for a hard drive. Studio audio tapes from the '60s are still being used to produce modern "remasters" of old audio recordings which sound better than they ever did before (other than the master tape).

Those are analog tapes... they just fade away...
For data, it depends on the ECC methods, quality of the media, environment, etc. You will find considerable attention spent on verification of data on tapes in archiving products. In the tape world, there are slightly different conditions than the magnetic disk world, but I can't think of a single study which shows that magnetic disks get more reliable over time, while there are dozens which show that they get less reliable and that latent sector errors dominate, as much as 5x, over full disk failures. My studies of Sun disk failure rates have shown similar
results.

Some forms of magnetic hysteresis are known to last millions of years. Media failure is more often than not mechanical or chemical and not related to loss of magnetic hysteresis. Head failures may be construed to be media failures.

Here is a good study from the University of Wisconsin-Madison which clearly shows the relationship between disk age and latent sector errors. It also shows how the increase in aerial density also increases the latent sector error (LSE) rate. Additionally, this gets back to the ECC method, which we observe to be different on consumer-grade and enterprise-class disks. The study shows a clear win for enterprise-class drives wrt latent errors. The paper suggests a 2- week scrub cycle and recognizes that many RAID arrays have such policies. There are indeed many studies which show latent sector errors are a bigger problem
as the disk ages.
        An Analysis of Latent Sector Errors in Disk Drives
        www.cs.wisc.edu/adsl/Publications/latent-sigmetrics07.ps


See http://en.wikipedia.org/wiki/Ferromagnetic for information on ferromagnetic materials.

For disks we worry about the superparamagnetic effect.
        http://en.wikipedia.org/wiki/Superparamagnetism

Quoting US Patent 6987630,
        ... the superparamagnetic effect is a thermal relaxation of information
        stored on the disk surface. Because the superparamagnetic effect may
        occur at room temperature, over time, information stored on the disk
        surface will begin to decay. Once the stored information decays beyond
a threshold level, it will be unable to be properly read by the read head
        and the information will be lost.

        The superparamagnetic effect manifests itself by a loss in amplitude in
        the readback signal over time or an increase in the mean square error
        (MSE) of the read back signal over time. In other words, the readback
        signal quality metrics are means square error and amplitude as measured
        by the read channel integrated circuit. Decreases in the quality of the
        readback signal cause bit error rate (BER) increases. As is well known,
        the BER is the ultimate measure of drive performance in a disk drive.

This effect is based on the time since written. Hence, older data can have
higher MSE and subsequent BER leading to a UER.

To be fair, newer disk technology is constantly improving. But what is
consistent with the physics is that increase in bit densities leads to
more space and rebalancing the BER. IMHO, this is why we see densities
increase, but UER does not increase (hint: marketing always wins these
sorts of battles).

FWIW, flash memories are not affected by superparamagnetic decay.

It would be most useful if zfs incorporated a slow-scan scrub which validates data at a low rate of speed which does not hinder active I/ O. Of course this is not a "green" energy efficient solution.

Oprea and Juels write, "Our key insight is that more aggressive scrubbing does not always increase disk reliability, as previously believed." They show how read-induced LSEs would tend to encourage you to scrub less frequently.
They also discuss the advantage of random versus sequential scrubbing. I
would classify zfs scrubs as more random than sequential, for most workloads. Their model is even more sophisticated and considers scrubbing policy based on the age of the disk and how many errors have been previously detected.
        A Clean-Slate Look at Disk Scrubbing
        http://www.rsa.com/rsalabs/staff/bios/aoprea/publications/scrubbing.pdf

Finally, there are two basic types of scrubs: read-only and rewrite. ZFS does read-only. Other scrubbers can do rewrite. There is evidence that rewrites
are better for attacking superparamagnetic decay issues.

So it is still not clear what the best scrubbing model or interval should be for the general case. I suggest scrubbing periodically, but not continuously :-)

Currently, scrub has the lowest priority for the vdev_queue. But I think the
vdev_queue could use more research.
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to