"hard errors" are a generic classification.  fmdump -eV shows the 
sense/asc/ascq, which
is generally more useful for diagnosis.  More below...


On Jan 1, 2011, at 7:50 AM, Benji wrote:

> Hi,
> 
> I recently noticed that there are a lot of Hard Errors on multiple drives 
> that's being reported by iostat. Also, dmesg reports various messages from 
> the mpt driver.
> 
> My config is:
> MB: SUPERMICRO X8SIL-F
> HBA: AOC-USAS-L8i (LSI 1068)
> RAM: 4GB ECC
> SunOS SAN 5.11 snv_134 i86pc i386 i86pc Solaris
> 
> My configuration is a striped mirrored vdev of 13 drives (one mirror had an 
> error on a drive, which I cleared. But just to be safe I added another drive 
> to the mirror):
> 
> NAME         STATE     READ WRITE CKSUM
>        zpool        ONLINE       0     0     0
>          mirror-0   ONLINE       0     0     0
>            c4t13d0  ONLINE       0     0     0
>            c4t19d0  ONLINE       0     0     0
>          mirror-1   ONLINE       0     0     0
>            c4t25d0  ONLINE       0     0     0
>            c4t31d0  ONLINE       0     0     0
>          mirror-2   ONLINE       0     0     0
>            c4t12d0  ONLINE       0     0     0
>            c4t18d0  ONLINE       0     0     0
>          mirror-3   ONLINE       0     0     0
>            c4t24d0  ONLINE       0     0     0
>            c4t30d0  ONLINE       0     0     0
>          mirror-4   ONLINE       0     0     0
>            c4t11d0  ONLINE       0     0     0
>            c4t17d0  ONLINE       0     0     0
>            c4t10d0  ONLINE       0     0     0
>          mirror-5   ONLINE       0     0     0
>            c4t23d0  ONLINE       0     0     0
>            c4t29d0  ONLINE       0     0     0
> 
> 
> Here's the output from iostat -En:
> 
> c6d1             Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
> Model: WDC WD3200BEKT- Revision:  Serial No:      WD-WXR1A30 Size: 320.07GB 
> <320070352896 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0
> c7d1             Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
> Model: WDC WD3200BEKT- Revision:  Serial No:      WD-WXR1A30 Size: 320.07GB 
> <320070352896 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0
> c4t12d0          Soft Errors: 0 Hard Errors: 252 Transport Errors: 0
> Vendor: ATA      Product: SAMSUNG HD203WI  Revision: 0003 Serial No:
> Size: 2000.40GB <2000398934016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0 Predictive Failure Analysis: 0
> c4t13d0          Soft Errors: 0 Hard Errors: 252 Transport Errors: 0
> Vendor: ATA      Product: SAMSUNG HD203WI  Revision: 0002 Serial No:
> Size: 2000.40GB <2000398934016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0 Predictive Failure Analysis: 0
> c4t18d0          Soft Errors: 0 Hard Errors: 252 Transport Errors: 0
> Vendor: ATA      Product: SAMSUNG HD203WI  Revision: 0003 Serial No:
> Size: 2000.40GB <2000398934016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0 Predictive Failure Analysis: 0
> c4t19d0          Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
> Vendor: ATA      Product: SAMSUNG HD203WI  Revision: 0002 Serial No:
> Size: 2000.40GB <2000398934016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0 Predictive Failure Analysis: 0
> c4t24d0          Soft Errors: 0 Hard Errors: 252 Transport Errors: 0
> Vendor: ATA      Product: SAMSUNG HD203WI  Revision: 0003 Serial No:
> Size: 2000.40GB <2000398934016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0 Predictive Failure Analysis: 0
> c4t25d0          Soft Errors: 0 Hard Errors: 252 Transport Errors: 0
> Vendor: ATA      Product: SAMSUNG HD203WI  Revision: 0002 Serial No:
> Size: 2000.40GB <2000398934016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0 Predictive Failure Analysis: 0
> c4t30d0          Soft Errors: 0 Hard Errors: 252 Transport Errors: 0
> Vendor: ATA      Product: SAMSUNG HD203WI  Revision: 0003 Serial No:
> Size: 2000.40GB <2000398934016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0 Predictive Failure Analysis: 0
> c4t31d0          Soft Errors: 0 Hard Errors: 252 Transport Errors: 0
> Vendor: ATA      Product: SAMSUNG HD203WI  Revision: 0002 Serial No:
> Size: 2000.40GB <2000398934016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0 Predictive Failure Analysis: 0
> c4t17d0          Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
> Vendor: ATA      Product: WDC WD20EADS-32S Revision: 0A01 Serial No:
> Size: 2000.40GB <2000398934016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0 Predictive Failure Analysis: 0
> c4t11d0          Soft Errors: 0 Hard Errors: 17 Transport Errors: 116
> Vendor: ATA      Product: WDC WD20EADS-32S Revision: 5G04 Serial No:
> Size: 2000.40GB <2000398934016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 17 Recoverable: 0
> Illegal Request: 0 Predictive Failure Analysis: 0
> c4t23d0          Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
> Vendor: ATA      Product: ST31500341AS     Revision: CC1H Serial No:
> Size: 1500.30GB <1500301910016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0 Predictive Failure Analysis: 0
> c4t29d0          Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
> Vendor: ATA      Product: ST31500341AS     Revision: CC1H Serial No:
> Size: 1500.30GB <1500301910016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0 Predictive Failure Analysis: 0
> c4t10d0          Soft Errors: 0 Hard Errors: 252 Transport Errors: 0
> Vendor: ATA      Product: SAMSUNG HD204UI  Revision: 0001 Serial No:
> Size: 2000.40GB <2000398934016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0 Predictive Failure Analysis: 0
> 
> And a sample from dmesg:
> 
> Jan  1 10:26:28 SAN     Log info 0x31123000 received for target 11.
> Jan  1 10:26:28 SAN     scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
> Jan  1 10:26:28 SAN scsi: [ID 365881 kern.info] 
> /pci@0,0/pci8086,d138@3/pci15d9,a580@0 (mpt0):
> Jan  1 10:26:28 SAN     Log info 0x31123000 received for target 11.
> Jan  1 10:26:28 SAN     scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
> Jan  1 10:26:28 SAN scsi: [ID 365881 kern.info] 
> /pci@0,0/pci8086,d138@3/pci15d9,a580@0 (mpt0):
> Jan  1 10:26:28 SAN     Log info 0x31123000 received for target 11.
> Jan  1 10:26:28 SAN     scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc

This is the unit explaining that it aborted a command. This can be
due to a bus reset, which is, by default, part of the recovery process.
The default bus reset can be changed, as documented in the sd man page.

> What do they mean? It can't be that most of my SAMSUNG drives are failing? 
> They almost all have the same number of errors, which is weird. Could this be 
> caused by the fact that these SAMSUNG drives have 4K sectors? 'zpool status' 
> reports no errors, although it did report a checksum error a while back on a 
> drive, which I cleared.

In my experience, this looks like a set of devices sitting behind an
expander. I have seen one bad disk take out all disks sitting behind
an expander.  I have also seen bad disk firmware take out all disks
behind an expander.  I once saw a bad cable take out everything.
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to