Re: [zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

Miles Nordin Fri, 01 May 2009 11:02:20 -0700

>>>>> "sl" == Scott Lawson <scott.law...@manukau.ac.nz> writes:
>>>>> "wa" == Wilkinson, Alex <alex.wilkin...@dsto.defence.gov.au> writes:
>>>>> "dg" == Dale Ghent <da...@elemental.org> writes:
>>>>> "djm" == Darren J Moffat <darr...@opensolaris.org> writes:


    sl> Specifically I am talking of ZFS snapshots, rollbacks,
    sl> cloning, clone promotion,

[...]

    sl> Of course to take maximum advantage of ZFS in full, then as
    sl> everyone has mentioned it is a good idea to let ZFS manage the
    sl> underlying raw disks if possible.

okay, but these two feature groups are completely orthogonal.  You can
get the ZFS revision tree which helped you so much, and all the other
features you mentioned, with a single-LUN zpool.

    wa> So, shall I forget ZFS and use UFS ?

Naturally here you will find mostly people who have chosen to use ZFS,
so I think you will have to think on your own rather than taking a
poll of the ZFS list.  

Myself, I use ZFS.  I would probably use it on a single-LUN SAN pool,
but only if I had a backup system onto a second zpool, and iff I could
do a restore/cutover really quickly if the primary zpool became
corrupt.  Some people have zpools that take days to restore, and in
that case I would not do it---I'd want direct-attached storage,
restore-by-cutover, or at the very least zpool-level redundancy.  I'm
using ZFS on a SAN right now, but my SAN is just Linux iSCSI targets,
and it is exporting many JBOD LUN's with zpool-level redundancy so I'm
less at risk for the single-LUN lost pool problems than you'd be with
single-lun EMC.  And I have a full backup onto another zpool, on a
machine capable enough to assume the role of the master, albeit not
automatically.

For a lighter filesystem I'm looking forward to the liberation of QFS,
too.  And in the future I think Solaris plans to offer redundancy
options above the filesystem level, like pNFS and Lustre, which may
end up being the ultimate win because of the way they can move the
storage mesh onto a big network switch, rather than what we have with
ZFS where it's a couple bonded gigabit ethernet cards and a single
PCIe backplane.  Not all of ZFS's features will remain useful in such
a world.

However I don't think there is ANY situation in which you should run
UFS over a zvol (which is one of the things you mentioned).  That's
only interesting for debugging or performance comparison (meaning it
should always perform worse, or else there's a bug).  If you read the
replies you got more carefully you'll find doing that addresses none
of the concerns people raised.

    dg> Not at all. Just export lots of LUNs from your EMC to get the
    dg> IO scheduling win, not one giant one, and configure the zpool
    dg> as a stripe.

I've never heard of using multiple-LUN stripes for storage QoS before.
Have you actually measured some improvement in this configuration over
a single LUN?  If so that's interesting.

But it's important to understand there's no difference between
multiple LUN stripes and a single big LUN w.r.t. reliability, as far
as we know to date.  The advice I've seen here to use multiple LUN's
over SAN vendor storage is, until now, not for QoS but for one of two
reasons:

  * availability.  a zpool mirror of LUNs on physically distant, or at
    least separate, storage vendor gear.

  * avoid the lost-zpool problem when there are SAN reboots or storage
    fabric disruptions without a host reboot.

   djm> Not if you want ZFS to actually be able to recover from
   djm> checksum detected failures.

while we agree recovering from checksum failures is an advantage of
zpool-level redundancy, I don't think it predominates the actual
failures observed by people using SAN's.  The lost-my-whole-zpool
failure mode predominates, and in the two or three cases when it was
examined enough to recover the zpool, it didn't look like a checksum
problem.  It looked like either ZFS bugs or lost writes, or one
leading to the other.  And having zpool-level redundancy may happen to
make this failure mode much less common, but it won't eliminate it,
especially since we still haven't tracked down the root cause.

Also we need to point out there *is* an availability advantage to
letting the SAN manage a layer of redundancy, because SAN's are much
better at dealing with failing disks without crashing/slowing down
than ZFS, so far.

I've never heard of anyone actually exporting JBOD from EMC yet.  Is
someone actually doing this?  So far I've heard of people burning huge
$$$$$$ of disk by exporting two RAID LUN's from the SAN and then
mirroring them with zpool.

   djm> If ZFS is just given 1 or more LUNs in a stripe then it is
   djm> unlikely to be able to recover from data corruption, it might
   djm> be able to recover metadata because it is always stored with
   djm> at least copies=2 but that is best efforts.

okay, fine, nice feature.  But this failure is not actually happening,
based on reports to the list.  It's redundancy in space, while reports
we've seen from SAN's show what's really needed is redundancy in time,
if that's even possible.

pgpozJwfU79Dm.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

Reply via email to