Erik Trimble wrote:
ZFS no longer has the issue where loss of a single device (even intermittently) causes pool corruption. That's been fixed.

Erik, it does not help at all when you are talking about some issue being fixed and does not provide corresponding CR number. It does not allow interested observer to go have a look what exactly that issue was, how it's been fixed, does not allow to track it presence or absence in other releases.

So could you please provide CR number for an issue you are talking about?


That is, there used to be an issue in this scenario:

(1) zpool constructed from a single LUN on a SAN device
(2) SAN experiences temporary outage, while ZFS host remains running.
(3) zpool is permanently corrupted, even if no I/O occured during outage

This is fixed. (around b101, IIRC)

You see - you cannot tell exactly when it was fixed yourself. Besides, in the scenario you describe above a whole lot can be hidden in the "SAN experiences temporary outage". It can be as simple as wrong fiber cable being unplugged, and as complex as some storage array failing, rebooting and loosing its entire cache content as a result.

In the former case I do not see how it could badly affect ZFS pool. It may cause panic, if 'failmode' is set to panic (or software release is too old and does not support this property), it may require administrator intervention to do 'zpool clear'.

In the latter case consequences can really be bad - pool may be corrupted and unopenable. And there are several examples of this in the archives, as well as success stories of successful recovery.

And there's recovery project to provide support for pool recovery resulting from these corruptions.

However, ZFS remains much more sensitive to loss of the underlying
LUN  than UFS, and has a tendency to mark such a LUN as defective
> during any such SAN outage. It's much more recoverable nowdays,
> though. Just to be clear, this occasionally occurs when something such
> as a SAN switch dies, or there is a temporary hiccup in the SAN
> infrastructure, causing some small (i.e. < minute) loss of
> connectivity to the underlying LUN.

Again, SANs are very complex structures, and perceived small loss of connectivity may in reality be very complex event with difficult to predict consequences.

With non-COW filesystems (like UFS) it is indeed less likely to experience consequences of small outage immediately (though they can still manifest itself much much later).

ZFS tends to uncover presence of the consequences much earlier (immediately?). But that does not immediately mean there's an issue with ZFS. There may be issue somewhere within SAN infrastructure which was only unavailable for less than a minute.

RAIDZ and mirrored zpools are still the preferred method of arranging things in ZFS, even with hardware raid backing the underlying LUN (whether the LUN is from a SAN or local HBA doesn't matter).

Fully support this - without redundancy at the ZFS level there's no such benefit as self-healing...

regards,
victor
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to