On Sep 28, 2009, at 11:41 AM, Bob Friesenhahn wrote:
On Mon, 28 Sep 2009, Richard Elling wrote:
In other words, I am concerned that people replace good data
protection
practices with scrubs and expecting scrub to deliver better
data protection
(it won't).
Many people here would profoundly disagree with the above. There is
no substitute for good backups, but a periodic scrub helps validate
that a later resilver would succeed. A perioic scrub also helps
find system problems early when they are less likely to crater your
business. It is much better to find an issue during a scrub rather
than during resilver of a mirror or raidz.
As I said, I am concerned that people would mistakenly expect that
scrubbing
offers data protection. It doesn't. I think you proved my point? ;-)
Scrubs are also useful for detecting broken hardware. However,
normal activity will also detect broken hardware, so it is better
to think of scrubs as finding degradation of old data rather than
being a hardware checking service.
Do you have a scientific reference for this notion that "old data"
is more likely to be corrupt than "new data" or is it just a gut-
feeling? This hypothesis does not sound very supportable to me.
Magnetic hysteresis lasts quite a lot longer than the recommended
service life for a hard drive. Studio audio tapes from the '60s are
still being used to produce modern "remasters" of old audio
recordings which sound better than they ever did before (other than
the master tape).
Those are analog tapes... they just fade away...
For data, it depends on the ECC methods, quality of the media,
environment, etc.
You will find considerable attention spent on verification of data on
tapes in
archiving products. In the tape world, there are slightly different
conditions than
the magnetic disk world, but I can't think of a single study which
shows that
magnetic disks get more reliable over time, while there are dozens
which show
that they get less reliable and that latent sector errors dominate, as
much as 5x,
over full disk failures. My studies of Sun disk failure rates have
shown similar
results.
Some forms of magnetic hysteresis are known to last millions of
years. Media failure is more often than not mechanical or chemical
and not related to loss of magnetic hysteresis. Head failures may
be construed to be media failures.
Here is a good study from the University of Wisconsin-Madison which
clearly
shows the relationship between disk age and latent sector errors. It
also shows
how the increase in aerial density also increases the latent sector
error (LSE) rate.
Additionally, this gets back to the ECC method, which we observe to be
different
on consumer-grade and enterprise-class disks. The study shows a clear
win
for enterprise-class drives wrt latent errors. The paper suggests a 2-
week
scrub cycle and recognizes that many RAID arrays have such policies.
There
are indeed many studies which show latent sector errors are a bigger
problem
as the disk ages.
An Analysis of Latent Sector Errors in Disk Drives
www.cs.wisc.edu/adsl/Publications/latent-sigmetrics07.ps
See http://en.wikipedia.org/wiki/Ferromagnetic for information on
ferromagnetic materials.
For disks we worry about the superparamagnetic effect.
http://en.wikipedia.org/wiki/Superparamagnetism
Quoting US Patent 6987630,
... the superparamagnetic effect is a thermal relaxation of information
stored on the disk surface. Because the superparamagnetic effect may
occur at room temperature, over time, information stored on the disk
surface will begin to decay. Once the stored information decays beyond
a threshold level, it will be unable to be properly read by the read
head
and the information will be lost.
The superparamagnetic effect manifests itself by a loss in amplitude in
the readback signal over time or an increase in the mean square error
(MSE) of the read back signal over time. In other words, the readback
signal quality metrics are means square error and amplitude as measured
by the read channel integrated circuit. Decreases in the quality of the
readback signal cause bit error rate (BER) increases. As is well known,
the BER is the ultimate measure of drive performance in a disk drive.
This effect is based on the time since written. Hence, older data can
have
higher MSE and subsequent BER leading to a UER.
To be fair, newer disk technology is constantly improving. But what is
consistent with the physics is that increase in bit densities leads to
more space and rebalancing the BER. IMHO, this is why we see densities
increase, but UER does not increase (hint: marketing always wins these
sorts of battles).
FWIW, flash memories are not affected by superparamagnetic decay.
It would be most useful if zfs incorporated a slow-scan scrub which
validates data at a low rate of speed which does not hinder active I/
O. Of course this is not a "green" energy efficient solution.
Oprea and Juels write, "Our key insight is that more aggressive
scrubbing
does not always increase disk reliability, as previously believed."
They show
how read-induced LSEs would tend to encourage you to scrub less
frequently.
They also discuss the advantage of random versus sequential scrubbing. I
would classify zfs scrubs as more random than sequential, for most
workloads.
Their model is even more sophisticated and considers scrubbing policy
based
on the age of the disk and how many errors have been previously
detected.
A Clean-Slate Look at Disk Scrubbing
http://www.rsa.com/rsalabs/staff/bios/aoprea/publications/scrubbing.pdf
Finally, there are two basic types of scrubs: read-only and rewrite.
ZFS does
read-only. Other scrubbers can do rewrite. There is evidence that
rewrites
are better for attacking superparamagnetic decay issues.
So it is still not clear what the best scrubbing model or interval
should be
for the general case. I suggest scrubbing periodically, but not
continuously :-)
Currently, scrub has the lowest priority for the vdev_queue. But I
think the
vdev_queue could use more research.
-- richard
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss