On Mon, May 16, 2011 at 8:45 PM, Jim Klimov <jimkli...@cos.ru> wrote:

> If MTBFs were real, we'd never see disks failing within a year ;)

    Remember that MTBF (and MTTR and MTTDL) are *statistics* and not
guarantees. If a type of drive has an MTBF of 10 years, then the MEAN
(average) time between failures for a _big_enough_sample_ set will be
10 years. Of course these failures will be both front and back end
loaded :-) I use MTBF as a *relative* measure of drives. A drive with
an MTBF of 10 years will *probably* survive twice as long as a drive
with an MTBF of 5 years.

> Problem is, these values seem to be determined in an ivory-tower
> lab. An expensive-vendor edition of a drive running in a cooled
> data center with shock absorbers and other nice features does
> often live a lot longer than a similar OEM enterprise or consumer
> drive running in an apartment with varying weather around and
> often overheating and randomly vibrating with a dozen other
> disks rotating in the same box.

    Actually, I'll bet the values are calculated based on the MTBF of
the components of the drive. And those MTBF values are calculated or
estimated based on "accelerated aging" tests :-) So the final MTBF is
a guess based on multiple assumptions :-) You really can't measure
MTBF except after the fact.

> The ramble about expensive-vendor drive editions comes from
> my memory of some forum or blog discussion which I can't point
> to now either, which suggested that vendors like Sun do not
> charge 5x-10x the price of the same label of OEM drive just
> for a nice corporate logo stamped onto the disk.

    The firmware on a Sun badged drive is *different* than the generic
version. I expect the same is true of IBM, HP , maybe (but probably
not) Dell. The Sun drives returns a Sun identified for to a SCSI
inquiry command. The Vendor ID and Product ID (VID and PID) are
different from a generic. Also, in the case of Sun, if Sun sells a
drive as a 72 GB, then no matter the manufacturer, the number of
blocks will match (although they have screwed that up on a couple
occasions), permitting any manufacturer's Sun 72 GB drive to swap for
any other.

> Vendors were
> said to burn-in the drives in their labs for like half a year or a
> year before putting the survivors to the market. This implies
> that some of the drives did not survive a burn-in period, and
> indeed the MTBF for the remaining ones is higher because
> "infancy death" due to manufacturing problems soon after
> arrival to the end customer is unlikely for these particular
> tested devices.

    I have never heard this and my experience does not support it. I
_have_ seen infant mortality rates with Sun badged drives consistent
with the overall drive market.

> The long burn-in times were also said to
> be the partial reason why vendors never sell the biggest
> disks available on the market (does any vendor sell 3Tb
> with their own brand already? Sun-Oracle? IBM? HP?)
> Thus may be obscured as "certification process" which
> occasionally takes about as long - to see if the newest
> and greatest disks die within a year or so.

I suspect there are three reasons for the delay:
1) certification (let's make sure these new drives really work)
2) time to build the Sun firmware for the new drive
3) supply chain delays (Sun needs a new P/N and new lists of what
works with what)

    I think the largest contributor to the price difference between a
Seagate drive with a Seagate badge and one with a Sun badge is the
profit inherent in additional layers or markup. Remember (at least
before Oracle), when you bought a drive from CDW or Newegg the profit
chain was:

1) Seagate
2) Newegg

But from Sun you were looking at:

1) Seagate (and Sun paid more here for their custom FW)
2) Sun
3) Master Reseller
4) Reseller

    I think that even Sun direct accounts were shipped via a Master
Reseller. I don't think Sun ever maintained their own warehouse of
stuff (at least since 1995 when I first started dealing with them).

> Another implied idea in that discussion was that the vendors
> can influence OEMs in choice of components, an example
> in the thread being about different marks of steel for the
> ball bearings. Such choices can drive the price up with
> a reason - disks like that are more expensive to produce -
> but also increases their reliability.

    Hurmmm, I would love to take a Seagate ES-2 series drive and a Sun
badged version of the same drive apart and see. (feel free to
substitute whatever the base Seagate model is for the Sun drive).

> In fact, I've had very few Sun disks breaking in the boxes
> I've managed over 10 years; all I can remember now were
> two or three 2.5" 72Gb Fujitsus with a Sun brand. Still, we
> have another dozen of those running so far for several years.

    I have seen the typical failure curve (ignoring the occasional bad
batch that has an 80% infant mortality rate), with about 4% - 5%
infant mortality in the first year (I just put 5 x J4400 with 120
drives on line in the past year and had 5 drives out of the 120 fail
within that first year), to a period of years with very, very few
failures, to a slow trickle of failures starting between 3 and 5 years
into the life of the drives (slow trickle is under 1 per month out of
over 100 drives). I suspect many customers never see the curve tilt up
at the end as they have already replaced the drives with bigger,
faster, better storage before they get to that point (I know I tend to
do that at home).

> So yes, I can believe that Big Vendor Brand disks can boast
> huge MTBFs and prove that with a track record, and such
> drives are often replaced not because of a break-down,
> but rather as a precaution, and because of "moral aging",
> such as low speed and small volume.

    I don't think I've ever seen Sun quote MTBF numbers on their badged drives.

> But for the rest of us (like Home-ZFS users) such numbers
> of MTBF are as fantastic as the Big Vendor prices, and
> inachievable for any number of reasons, starting with use
> of cheaper and potentially worse hardware from the
> beginning, and non-"orchard" conditions of running the
> machines...

    I have seen some commercial server rooms that make the environment
of my home server look very good by comparison :-) Like I said
earlier, use the MTBF numbers as comparison metric, not an estimate of
time to failure (although that is what they appear to be :-)

> I do have some 5-year-old disks running in computers
> daily and still alive, but I have about as many which died
> young, sometimes even within the warranty period ;)

    I have some 120 and 160 GB Seagate ATA drives that do not want to
die :-) That seemed to be a 'sweet spot' for long term reliability (or
I just got lucky).

-- 
{--------1---------2---------3---------4---------5---------6---------7---------}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to