On 1/23/11 10:30 AM, Roy Sigurd Karlsbakk wrote:
If you're looking for stats to give an indication of likely wear, and
thus increasing probably of failure, POH is probably not very useful
by
itself (or even at all). Things like Head Flying Hours and Load Cycle
Count are probably more indicative, although not necessarily
maintained
by all drives.

Of course, data which gives indication of actual (rather than likely)
wear is even more important as an indicator of impending failure, such
as the various error and retry counts.

I cannot but agree. iostat will show better info, and a script like 
http://karlsbakk.net/iostat-overview.sh can give you a pretty decent overview 
of which drives should be replaced. This will show you drives with errors 
reported. In my experience, a drive can last a long time, but may die early as 
well.

But google and CMU found there was an increase in failures as POH increased and that the "bathtub curve" was a myth perpetuated by drive manufacturers (who, of course, know that it is not true since they have certain "big picture" statistical data that the rest of us don't have).

I believe the CMU data showed that effectively after the third year, you are better off doing proactive drive refreshes rather than waiting for failures. YMMV. And I would add that I consider the environments that CMU tested - large HPC installed might be more "coddled" than many environments people have their disks in.

http://www.cs.cmu.edu/~bianca/fast07.pdf

In year 4 and year 5 (which are still within the nominal lifetime of these disks), the actual replacement rates are 7–10 times higher than the failure rates we expected based on datasheet MTTF.
...
Observation 5: Contrary to common and proposed models, hard drive replacement rates do not enter steady state after the first year of operation. Instead replacement rates seem to steadily increase over time.

Observation 6: Early onset of wear-out seems to have a much stronger impact on lifecycle replacement rates than infant mortality, as experienced by end customers, even when considering only the first three or five years of a system’s lifetime. We therefore recommend that wear-out be incorporated into new standards for disk drive reliability. The new standard suggested by IDEMA does not take wear-out into account [5, 33].

http://static.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/en/us/papers/disk_failures.pdf

Also, google mentioned that they could not find a good statistical correlation for any smart data fields to serve as predictors of failure.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to