Erik Trimble wrote:
ZFS no longer has the issue where loss of a single device (even
intermittently) causes pool corruption. That's been fixed.
Erik, it does not help at all when you are talking about some issue
being fixed and does not provide corresponding CR number. It does not
allow interested observer to go have a look what exactly that issue was,
how it's been fixed, does not allow to track it presence or absence in
other releases.
So could you please provide CR number for an issue you are talking about?
That is, there used to be an issue in this scenario:
(1) zpool constructed from a single LUN on a SAN device
(2) SAN experiences temporary outage, while ZFS host remains running.
(3) zpool is permanently corrupted, even if no I/O occured during outage
This is fixed. (around b101, IIRC)
You see - you cannot tell exactly when it was fixed yourself. Besides,
in the scenario you describe above a whole lot can be hidden in the "SAN
experiences temporary outage". It can be as simple as wrong fiber cable
being unplugged, and as complex as some storage array failing, rebooting
and loosing its entire cache content as a result.
In the former case I do not see how it could badly affect ZFS pool. It
may cause panic, if 'failmode' is set to panic (or software release is
too old and does not support this property), it may require
administrator intervention to do 'zpool clear'.
In the latter case consequences can really be bad - pool may be
corrupted and unopenable. And there are several examples of this in the
archives, as well as success stories of successful recovery.
And there's recovery project to provide support for pool recovery
resulting from these corruptions.
However, ZFS remains much more sensitive to loss of the underlying
LUN than UFS, and has a tendency to mark such a LUN as defective
> during any such SAN outage. It's much more recoverable nowdays,
> though. Just to be clear, this occasionally occurs when something such
> as a SAN switch dies, or there is a temporary hiccup in the SAN
> infrastructure, causing some small (i.e. < minute) loss of
> connectivity to the underlying LUN.
Again, SANs are very complex structures, and perceived small loss of
connectivity may in reality be very complex event with difficult to
predict consequences.
With non-COW filesystems (like UFS) it is indeed less likely to
experience consequences of small outage immediately (though they can
still manifest itself much much later).
ZFS tends to uncover presence of the consequences much earlier
(immediately?). But that does not immediately mean there's an issue with
ZFS. There may be issue somewhere within SAN infrastructure which was
only unavailable for less than a minute.
RAIDZ and mirrored zpools are still the preferred method of arranging
things in ZFS, even with hardware raid backing the underlying LUN
(whether the LUN is from a SAN or local HBA doesn't matter).
Fully support this - without redundancy at the ZFS level there's no such
benefit as self-healing...
regards,
victor
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss