Re: [zfs-discuss] Split responsibility for data with ZFS

Miles Nordin Fri, 12 Dec 2008 12:11:03 -0800

>>>>> "tt" == Toby Thain <t...@telegraphics.com.au> writes:
>>>>> "mg" == Mike Gerdts <mger...@gmail.com> writes:


    tt> I think we have to assume Anton was joking - otherwise his
    tt> measure is uselessly unscientific.

I think it's rude to talk about someone who's present in the third
person, especially when you're trying to minimize his view.  Were you
joking, Anton? :)

0. The reports I read were not useless in the way some have stated,
   because for example Mike sampled his own observations:

    mg> In the past year I've lost more ZFS file systems than I have
    mg> any other type of file system in the past 5 years.  With other
    mg> file systems I can almost always get some data back.  With ZFS
    mg> I can't get any back.

   It's not just bloggers and pundits sampling mailing list traffic.  I
   thought there was at least one other post like this but could not
   find it.


1. I don't think your impressions nor Anton's and mine are ``useless''


2. I don't think your positive impression is any more scientific than
   his and my skeptical one.


3. I'm in general troubled by reports of corruption that aren't
   well-investigated, because this will stop young, fragile
   filesystems from becoming old and robust.  BUT....


4. I'm less troubled by (3) because a few of the corruption reports
   were well-investigated by Victor, and he recovered them manually
   and posted a summary here:

    http://mail.opensolaris.org/pipermail/zfs-discuss/2008-October/051643.html

   and how the exprience might inform ZFS improvements:

    http://mail.opensolaris.org/pipermail/zfs-discuss/2008-October/051667.html


5. I'm more troubled again because everyone seems to have forgotten
   (4).  Mike, Victor, and others can't necessarily repeat themselves
   every time this thread's resurrected.  If yapping mailing list
   monkeys like me don't remember this experience, invested-wishing
   and marketing white papers will drown out the experience we're
   getting.

   I've pointed straight at an unfixed corruption problem that's
   biting ZFS users, and the discussion about where to place the blame
   and how to fix it.  It is not fixed now, yet pundits on-list and
   all over the Interweb like here:

    http://www.kev009.com/wp/2008/11/on-file-systems/

   talk about corruption bugs hazily and say ``most of all that's been
   fixed'' when it's not so hazy and hasn't been, then focus on
   theoretical unrealized capabilities of the on-disk format and
   mimimize this clear experience into ghostly distant-past rumor.  

I don't see when the single-LUN SAN corruption problems were fixed.  I
think the supposed ``silent FC bit flipping'' basis for the ``use
multiple SAN LUN's'' best-practice is revoltingly dishonest, that we
_know_ better.  I'm not saying devices aren't guilty---Sun's sun4v IO
virtualizer was documented as guilty of ignoring cache flushes to
inflate performance just like the loomingly-unnamed models of lying
SATA drives:

 http://mail.opensolaris.org/pipermail/zfs-discuss/2008-October/051735.html

Is a storage-stack-related version this problem the cause of lost
single-LUN SAN pools?  maybe, maybe not, but either way we need an
end-to-end solution.  I don't currently see an end-to-end solution to
this pervasive blame-the-device mantra every time a pool goes bad.

I keep digging through the archives to post messages like this because
I feel like everyone only wants to have happy memories, and that it's
going to bring about a sad end.

pgpiZ4czIw8xW.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Split responsibility for data with ZFS

Reply via email to