Todd H. Poole wrote:
> Hmmm... I see what you're saying. But, ok, let me play devil's advocate. What 
> about the times when a drive fails in a way the system didn't expect? What 
> you said was right - most of the time, when a hard drive goes bad, SMART will 
> pick up on it's impending doom long before it's too late - but what about the 
> times when the cause of the problem is larger or more abrupt than that (like 
> tin whiskers causing shorts, or a server room technician yanking the wrong 
> drive)?
>
> To imply that OpenSolaris with a RAID-Z array of IDE drives will _only_ 
> protect me from data loss during _specific_ kinds of failures (the one's 
> which OpenSolaris considers "normal") is a pretty big implication... and is 
> certainly a show-stopping one at that. Nobody is going to want to rely on an 
> OS/RAID solution that can only survive certain types of drive failures, while 
> there are others out there that can survive the same and more... 
>
> But then again, I'm not sure if that's what you meant... is that what you 
> were getting at, or did I misunderstand?
>   

I think there's a misunderstanding concerning underlying concepts. I'll 
try to explain my thoughts, please excuse me in case this becomes a bit 
lengthy. Oh, and I am not a Sun employee or ZFS fan, I'm just a customer 
who loves and hates ZFS at the same time ;-)

You know, ZFS is designed for high *reliability*. This means that ZFS 
tries to keep your data as safe as possible. This includes faulty 
hardware, missing hardware (like in your testing scenario) and, to a 
certain degree, even human mistakes.
But there are limits. For instance, ZFS does not make a backup 
unnecessary. If there's a fire and your drives melt, then ZFS can't do 
anything. Or if the hardware is lying about the drive geometry. ZFS is 
part of the operating environment and, as a consequence, relies on the 
hardware. 
so ZFS can't make unreliable hardware reliable. All it can do is trying 
to protect the data you saved on it. But it cannot guarantee this to you 
if the hardware becomes its enemy.
A real world example: I have a 32 core Opteron server here, with 4 
FibreChannel Controllers and 4 JBODs with a total of FC drives connected 
to it, running a RAID 10 using ZFS mirrors. Sounds a lot like high end 
hardware compared to your NFS server, right? But ... I have exactly the 
same symptom. If one drive fails, an entire JBOD with all 16 included 
drives hangs, and all zpool access freezes. The reason for this is the 
miserable JBOD hardware. There's only one FC loop inside of it, the 
drives are connected serially to each other, and if one drive dies, the 
drives behind it go downhill, too. ZFS immediately starts caring about 
the data, the zpool command hangs (but I still have traffic on the other 
half of the ZFS mirror!), and it does the right thing by doing so: 
whatever happens, my data must not be damaged.
A "bad" filesystem like Linux ext2 or ext3 with LVM would just continue, 
even if the Volume Manager noticed the missing drive or not. That's what 
you experienced. But you run in the real danger of having to use fsck at 
some point. Or, in my case, fsck'ing 5 TB of data on 64 drives. That's 
not much fun and results in a lot more downtime than replacing the 
faulty drive.

What can you expect from ZFS in your case? You can expect it to detect 
that a drive is missing and to make sure, that your _data integrity_  
isn't compromised. By any means necessary.  This may even require  to 
make a system completely unresponsive until a timeout has passed.




But what you described is not a case of reliability. You want something 
completely different. You expect it to deliver *availability*.

And availability is something ZFS doesn't promise. It simply can't 
deliver this. You have the impression that NTFS and various other 
Filesystems do so, but that's an illusion. The next reboot followed by a 
fsck run will show you why. Availability requires full reliability of 
every included component of your server as a minimum,  and you can't 
expect ZFS or any other filesystem to deliver this  with cheap IDE 
hardware.

Usually people want to save money when buying hardware, and ZFS is a 
good choice to deliver the *reliability* then. But the conceptual 
stalemate between reliability and availability of such cheap hardware 
still exists - the hardware is cheap, the file system and services may 
be reliable, but as soon as you want *availability*, it's getting 
expensive again, because you have to buy every hardware component at 
least twice.


So, you have the choice:

a) If you want *availability*, stay with your old solution. But oyu have 
no guarantee that your data is always intact. You'll always be able to 
stream your video, but you have no guarantee that the client will 
receive a stream without drop outs forever.

b) If you want *data integrity*, ZFS is your best friend. But you may 
have slight availability issues when it comes to hardware defects. You 
may reduce the percentage of pain during a desaster by spending more 
money, e.g. by making the SATA controllers redundant and creating a 
mirror (than controller 1 will hang, but controller 2 will continue 
working), but you must not forget that your PCI bridges, fans, power  
supplies, etc. remain single points of failures why can take the entire 
service down like your pulling of the non-hotpluggable drive did.

c) If you want both, you should buy a second server and create a NFS 
cluster.

Hope I could help you a bit,

  Ralf

-- 

Ralf Ramge
Senior Solaris Administrator, SCNA, SCSA

Tel. +49-721-91374-3963 
[EMAIL PROTECTED] - http://web.de/

1&1 Internet AG
Brauerstraße 48
76135 Karlsruhe

Amtsgericht Montabaur HRB 6484

Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Thomas 
Gottschlich, Matthias Greve, Robert Hoffmann, Markus Huhn, Oliver Mauss, Achim 
Weiss 
Aufsichtsratsvorsitzender: Michael Scheeren

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to