On 1/23/11 10:30 AM, Roy Sigurd Karlsbakk wrote:
If you're looking for stats to give an indication of likely wear, and
thus increasing probably of failure, POH is probably not very useful
by
itself (or even at all). Things like Head Flying Hours and Load Cycle
Count are probably more indicative, although not necessarily
maintained
by all drives.
Of course, data which gives indication of actual (rather than likely)
wear is even more important as an indicator of impending failure, such
as the various error and retry counts.
I cannot but agree. iostat will show better info, and a script like
http://karlsbakk.net/iostat-overview.sh can give you a pretty decent overview
of which drives should be replaced. This will show you drives with errors
reported. In my experience, a drive can last a long time, but may die early as
well.
But google and CMU found there was an increase in failures as POH increased
and that the "bathtub curve" was a myth perpetuated by drive manufacturers
(who, of course, know that it is not true since they have certain "big
picture" statistical data that the rest of us don't have).
I believe the CMU data showed that effectively after the third year, you are
better off doing proactive drive refreshes rather than waiting for failures.
YMMV. And I would add that I consider the environments that CMU tested - large
HPC installed might be more "coddled" than many environments people have their
disks in.
http://www.cs.cmu.edu/~bianca/fast07.pdf
In year 4 and year 5 (which are still within the nominal lifetime of these
disks), the actual replacement rates are 7–10 times higher than the failure
rates we expected based on datasheet MTTF.
...
Observation 5: Contrary to common and proposed models, hard drive replacement
rates do not enter steady state after the first year of operation. Instead
replacement rates seem to steadily increase over time.
Observation 6: Early onset of wear-out seems to have a much stronger impact on
lifecycle replacement rates than infant mortality, as experienced by end
customers, even when considering only the first three or five years of a
system’s lifetime. We therefore recommend that wear-out be incorporated into
new standards for disk drive reliability. The new standard suggested by IDEMA
does not take wear-out into account [5, 33].
http://static.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/en/us/papers/disk_failures.pdf
Also, google mentioned that they could not find a good statistical correlation
for any smart data fields to serve as predictors of failure.
Vennlige hilsener / Best regards
roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer på norsk.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss