On 28-Nov-06, at 10:35 PM, Anton B. Rang wrote:

No, you still have the hardware problem.

What hardware problem?

There seems to be an unspoken assumption that any checksum error detected by ZFS is caused by a relatively high error rate in the underlying hardware.

There are at least two classes of hardware-related errors. One class are those which are genuinely being introduced at a high rate, as exemplified by the post earlier in this list about the bad FibreChannel port on a SAN. The other are those which are very rare events, for instance a radiation-induced bit-flip in SRAM. In this case, there is no “problem” as such to be repaired (well, perhaps if you live in Denver you could buy radiation shielding for your computer room ;-).

(There are also software errors. Errors in ZFS itself or anywhere else in the Solaris kernel, including device drivers, can result in erroneous data being written to disk. There may be a software problem, rather than a hardware problem, in any individual case.)

Clearly, the existence of a high error rate (say, more than one error every two weeks on a server pushing 100 MB/second) would point to a hardware or software problem; but fewer errors may simply be “normal” for standard hardware.

Her original configuration wasn't redundant, so she should expect this kind of manual recovery from time to time. Seems a logical conclusion to me? Or is this one of those once-in-a-lifetime strikes?

--Toby



This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to