On Dec 14, 2007 4:23 AM, can you guess? <[EMAIL PROTECTED]> wrote:
> I assume that you're referring to ZFS checksum errors rather than to transfer 
> errors caught by the CRC resulting in retries.

Correct.

> If so, then the next obvious question is, what is causing the ZFS checksum 
> errors?  And (possibly of some help in answering that question) is the disk 
> seeing CRC transfer errors (which show up in its SMART data)?

The memory is ECC in this machine, and Memtest passed it for five
days.  The disk was indeed getting some pretty lousy SMART scores, but
that doesn't explain the controller issue.  This particular controller
is a SIIG-branded silicon image 0680 chipset (which is, apparently, a
piece of junk - if I'd done my homework I would've bought something
else)... but the premise stands.  I bought a piece of consumer-level
hardware off the shelf, it had corruption issues, and ZFS told me
about it when XFS had been silent.

> Once again, a significant question is whether the checksum errors are 
> accompanied by a lot of CRC transfer errors.  If not, that would strongly 
> suggest that they're not coming from bad transfers (and while they could 
> conceivably be the result of commands corrupted on the wire, so much more 
> data is transferred compared to command bandwidth that you'd really expect to 
> see data CRC errors too if commands were getting mangled).  When you wiggle 
> the cables, other things wiggle as well (I assume you've checked that your 
> RAM is solidly seated).

I don't remember offhand if I got CRC errors with the working
controller and drive and bad cabling, sorry.  RAM was solid, as
mentioned earlier.

> The extra strength comes more from its additional coverage (commands as well 
> as data).

Ah, that explains it.

Will
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to