"hard errors" are a generic classification. fmdump -eV shows the sense/asc/ascq, which is generally more useful for diagnosis. More below...
On Jan 1, 2011, at 7:50 AM, Benji wrote: > Hi, > > I recently noticed that there are a lot of Hard Errors on multiple drives > that's being reported by iostat. Also, dmesg reports various messages from > the mpt driver. > > My config is: > MB: SUPERMICRO X8SIL-F > HBA: AOC-USAS-L8i (LSI 1068) > RAM: 4GB ECC > SunOS SAN 5.11 snv_134 i86pc i386 i86pc Solaris > > My configuration is a striped mirrored vdev of 13 drives (one mirror had an > error on a drive, which I cleared. But just to be safe I added another drive > to the mirror): > > NAME STATE READ WRITE CKSUM > zpool ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > c4t13d0 ONLINE 0 0 0 > c4t19d0 ONLINE 0 0 0 > mirror-1 ONLINE 0 0 0 > c4t25d0 ONLINE 0 0 0 > c4t31d0 ONLINE 0 0 0 > mirror-2 ONLINE 0 0 0 > c4t12d0 ONLINE 0 0 0 > c4t18d0 ONLINE 0 0 0 > mirror-3 ONLINE 0 0 0 > c4t24d0 ONLINE 0 0 0 > c4t30d0 ONLINE 0 0 0 > mirror-4 ONLINE 0 0 0 > c4t11d0 ONLINE 0 0 0 > c4t17d0 ONLINE 0 0 0 > c4t10d0 ONLINE 0 0 0 > mirror-5 ONLINE 0 0 0 > c4t23d0 ONLINE 0 0 0 > c4t29d0 ONLINE 0 0 0 > > > Here's the output from iostat -En: > > c6d1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Model: WDC WD3200BEKT- Revision: Serial No: WD-WXR1A30 Size: 320.07GB > <320070352896 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 > c7d1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Model: WDC WD3200BEKT- Revision: Serial No: WD-WXR1A30 Size: 320.07GB > <320070352896 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 > c4t12d0 Soft Errors: 0 Hard Errors: 252 Transport Errors: 0 > Vendor: ATA Product: SAMSUNG HD203WI Revision: 0003 Serial No: > Size: 2000.40GB <2000398934016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c4t13d0 Soft Errors: 0 Hard Errors: 252 Transport Errors: 0 > Vendor: ATA Product: SAMSUNG HD203WI Revision: 0002 Serial No: > Size: 2000.40GB <2000398934016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c4t18d0 Soft Errors: 0 Hard Errors: 252 Transport Errors: 0 > Vendor: ATA Product: SAMSUNG HD203WI Revision: 0003 Serial No: > Size: 2000.40GB <2000398934016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c4t19d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: SAMSUNG HD203WI Revision: 0002 Serial No: > Size: 2000.40GB <2000398934016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c4t24d0 Soft Errors: 0 Hard Errors: 252 Transport Errors: 0 > Vendor: ATA Product: SAMSUNG HD203WI Revision: 0003 Serial No: > Size: 2000.40GB <2000398934016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c4t25d0 Soft Errors: 0 Hard Errors: 252 Transport Errors: 0 > Vendor: ATA Product: SAMSUNG HD203WI Revision: 0002 Serial No: > Size: 2000.40GB <2000398934016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c4t30d0 Soft Errors: 0 Hard Errors: 252 Transport Errors: 0 > Vendor: ATA Product: SAMSUNG HD203WI Revision: 0003 Serial No: > Size: 2000.40GB <2000398934016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c4t31d0 Soft Errors: 0 Hard Errors: 252 Transport Errors: 0 > Vendor: ATA Product: SAMSUNG HD203WI Revision: 0002 Serial No: > Size: 2000.40GB <2000398934016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c4t17d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: WDC WD20EADS-32S Revision: 0A01 Serial No: > Size: 2000.40GB <2000398934016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c4t11d0 Soft Errors: 0 Hard Errors: 17 Transport Errors: 116 > Vendor: ATA Product: WDC WD20EADS-32S Revision: 5G04 Serial No: > Size: 2000.40GB <2000398934016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 17 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c4t23d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c4t29d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c4t10d0 Soft Errors: 0 Hard Errors: 252 Transport Errors: 0 > Vendor: ATA Product: SAMSUNG HD204UI Revision: 0001 Serial No: > Size: 2000.40GB <2000398934016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > > And a sample from dmesg: > > Jan 1 10:26:28 SAN Log info 0x31123000 received for target 11. > Jan 1 10:26:28 SAN scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc > Jan 1 10:26:28 SAN scsi: [ID 365881 kern.info] > /pci@0,0/pci8086,d138@3/pci15d9,a580@0 (mpt0): > Jan 1 10:26:28 SAN Log info 0x31123000 received for target 11. > Jan 1 10:26:28 SAN scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc > Jan 1 10:26:28 SAN scsi: [ID 365881 kern.info] > /pci@0,0/pci8086,d138@3/pci15d9,a580@0 (mpt0): > Jan 1 10:26:28 SAN Log info 0x31123000 received for target 11. > Jan 1 10:26:28 SAN scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc This is the unit explaining that it aborted a command. This can be due to a bus reset, which is, by default, part of the recovery process. The default bus reset can be changed, as documented in the sd man page. > What do they mean? It can't be that most of my SAMSUNG drives are failing? > They almost all have the same number of errors, which is weird. Could this be > caused by the fact that these SAMSUNG drives have 4K sectors? 'zpool status' > reports no errors, although it did report a checksum error a while back on a > drive, which I cleared. In my experience, this looks like a set of devices sitting behind an expander. I have seen one bad disk take out all disks sitting behind an expander. I have also seen bad disk firmware take out all disks behind an expander. I once saw a bad cable take out everything. -- richard _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss