Re: [zfs-discuss] Hard Errors on HDDs
Richard Elling writes: > In my experience, this looks like a set of devices sitting behind an > expander. I have seen one bad disk take out all disks sitting behind > an expander. I have also seen bad disk firmware take out all disks > behind an expander. I once saw a bad cable take out everything. > -- richard In my experience i ve also seen the same problems. a lot of sata disks (seagate barracuda ES.2 and other) all behind expanders (supermicro sc847 chassis) the issue were solved after we removed all sata disks behind our expander and replaced them with Enterprise SAS Disks. thereafter we only faced this problems when an connected sata-ssd died. so we also moved our sata-ssds away from this backplane and connected them directly to the 1068 based controller. the problem arrised, after we moved a identically server to a expander backplane (to get more drives connected). before this discs were running for months without any problems *direct* attached. regards daniel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hard Errors on HDDs
"hard errors" are a generic classification. fmdump -eV shows the sense/asc/ascq, which is generally more useful for diagnosis. More below... On Jan 1, 2011, at 7:50 AM, Benji wrote: > Hi, > > I recently noticed that there are a lot of Hard Errors on multiple drives > that's being reported by iostat. Also, dmesg reports various messages from > the mpt driver. > > My config is: > MB: SUPERMICRO X8SIL-F > HBA: AOC-USAS-L8i (LSI 1068) > RAM: 4GB ECC > SunOS SAN 5.11 snv_134 i86pc i386 i86pc Solaris > > My configuration is a striped mirrored vdev of 13 drives (one mirror had an > error on a drive, which I cleared. But just to be safe I added another drive > to the mirror): > > NAME STATE READ WRITE CKSUM >zpoolONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 >c4t13d0 ONLINE 0 0 0 >c4t19d0 ONLINE 0 0 0 > mirror-1 ONLINE 0 0 0 >c4t25d0 ONLINE 0 0 0 >c4t31d0 ONLINE 0 0 0 > mirror-2 ONLINE 0 0 0 >c4t12d0 ONLINE 0 0 0 >c4t18d0 ONLINE 0 0 0 > mirror-3 ONLINE 0 0 0 >c4t24d0 ONLINE 0 0 0 >c4t30d0 ONLINE 0 0 0 > mirror-4 ONLINE 0 0 0 >c4t11d0 ONLINE 0 0 0 >c4t17d0 ONLINE 0 0 0 >c4t10d0 ONLINE 0 0 0 > mirror-5 ONLINE 0 0 0 >c4t23d0 ONLINE 0 0 0 >c4t29d0 ONLINE 0 0 0 > > > Here's the output from iostat -En: > > c6d1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Model: WDC WD3200BEKT- Revision: Serial No: WD-WXR1A30 Size: 320.07GB > <320070352896 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 > c7d1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Model: WDC WD3200BEKT- Revision: Serial No: WD-WXR1A30 Size: 320.07GB > <320070352896 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 > c4t12d0 Soft Errors: 0 Hard Errors: 252 Transport Errors: 0 > Vendor: ATA Product: SAMSUNG HD203WI Revision: 0003 Serial No: > Size: 2000.40GB <2000398934016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c4t13d0 Soft Errors: 0 Hard Errors: 252 Transport Errors: 0 > Vendor: ATA Product: SAMSUNG HD203WI Revision: 0002 Serial No: > Size: 2000.40GB <2000398934016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c4t18d0 Soft Errors: 0 Hard Errors: 252 Transport Errors: 0 > Vendor: ATA Product: SAMSUNG HD203WI Revision: 0003 Serial No: > Size: 2000.40GB <2000398934016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c4t19d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: SAMSUNG HD203WI Revision: 0002 Serial No: > Size: 2000.40GB <2000398934016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c4t24d0 Soft Errors: 0 Hard Errors: 252 Transport Errors: 0 > Vendor: ATA Product: SAMSUNG HD203WI Revision: 0003 Serial No: > Size: 2000.40GB <2000398934016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c4t25d0 Soft Errors: 0 Hard Errors: 252 Transport Errors: 0 > Vendor: ATA Product: SAMSUNG HD203WI Revision: 0002 Serial No: > Size: 2000.40GB <2000398934016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c4t30d0 Soft Errors: 0 Hard Errors: 252 Transport Errors: 0 > Vendor: ATA Product: SAMSUNG HD203WI Revision: 0003 Serial No: > Size: 2000.40GB <2000398934016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c4t31d0 Soft Errors: 0 Hard Errors: 252 Transport Errors: 0 > Vendor: ATA Product: SAMSUNG HD203WI Revision: 0002 Serial No: > Size: 2000.40GB <2000398934016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c4t17d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: WDC WD20EADS-32S Revision: 0A01 Serial No: > Size: 2000.40GB <2000398934016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c4t11d0 Soft Errors: 0 Har
Re: [zfs-discuss] Hard Errors on HDDs
For anyone that is interested, here's a progress report. I created a new pool with only one mirror vdev of 2 disks, namely with the new SAMSUNG HD204UI. These drives, along with the older HD203WI, use Advanced Format Technology (e.g. 4K sectors). Only these drives had hard errors in my pool, as opposed the the old Seagates and WDs. To create the new pool, I recompiled the zpool cmd to give the value of ashift 12 so that the new pool has an alignement of 4K instead of 512 bytes (see here : http://www.solarismen.de/archives/5-Solaris-and-the-new-4K-Sector-Disks-e.g.-WDxxEARS-Part-2.html). So I filled this new 4K aligned pool with 1.5TB of data, scrubbed it and no errors. I checked the log and no hard errors either. Usually after a scrub I get some hard errors. Maybe the pool needs to have more vdevs in it to really stress the HBA and produce hard errors, but it's a strange coincidence nonetheless that only the 4K drives had errors and then when used in a 4K aligned pool, no more errors. I'll probably re-create my original pool with only 4K drives in a 4K aligned pool and see what happens. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hard Errors on HDDs
Thanks for the input! I am using an Ipass to Ipass cable that connects my HBA to my backplane. It was firmly locked into both connectors. I offlined 2 supposedly faulty SAMSUNG drives, scanned their whole surface using estools and it did not report any errors. I'm starting to think that it may be an issue with the mpt driver and the HBA card. Anyone else using an LSI 1068E based HBA card and having issues? Thanks -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hard Errors on HDDs
Maybe a cable is loose? Reinsert all the cables into all drives? And the controller card? Yes, ZFS detects such problems. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Hard Errors on HDDs
Hi, I recently noticed that there are a lot of Hard Errors on multiple drives that's being reported by iostat. Also, dmesg reports various messages from the mpt driver. My config is: MB: SUPERMICRO X8SIL-F HBA: AOC-USAS-L8i (LSI 1068) RAM: 4GB ECC SunOS SAN 5.11 snv_134 i86pc i386 i86pc Solaris My configuration is a striped mirrored vdev of 13 drives (one mirror had an error on a drive, which I cleared. But just to be safe I added another drive to the mirror): NAME STATE READ WRITE CKSUM zpoolONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c4t13d0 ONLINE 0 0 0 c4t19d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c4t25d0 ONLINE 0 0 0 c4t31d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c4t12d0 ONLINE 0 0 0 c4t18d0 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 c4t24d0 ONLINE 0 0 0 c4t30d0 ONLINE 0 0 0 mirror-4 ONLINE 0 0 0 c4t11d0 ONLINE 0 0 0 c4t17d0 ONLINE 0 0 0 c4t10d0 ONLINE 0 0 0 mirror-5 ONLINE 0 0 0 c4t23d0 ONLINE 0 0 0 c4t29d0 ONLINE 0 0 0 Here's the output from iostat -En: c6d1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: WDC WD3200BEKT- Revision: Serial No: WD-WXR1A30 Size: 320.07GB <320070352896 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 c7d1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: WDC WD3200BEKT- Revision: Serial No: WD-WXR1A30 Size: 320.07GB <320070352896 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 c4t12d0 Soft Errors: 0 Hard Errors: 252 Transport Errors: 0 Vendor: ATA Product: SAMSUNG HD203WI Revision: 0003 Serial No: Size: 2000.40GB <2000398934016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c4t13d0 Soft Errors: 0 Hard Errors: 252 Transport Errors: 0 Vendor: ATA Product: SAMSUNG HD203WI Revision: 0002 Serial No: Size: 2000.40GB <2000398934016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c4t18d0 Soft Errors: 0 Hard Errors: 252 Transport Errors: 0 Vendor: ATA Product: SAMSUNG HD203WI Revision: 0003 Serial No: Size: 2000.40GB <2000398934016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c4t19d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: SAMSUNG HD203WI Revision: 0002 Serial No: Size: 2000.40GB <2000398934016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c4t24d0 Soft Errors: 0 Hard Errors: 252 Transport Errors: 0 Vendor: ATA Product: SAMSUNG HD203WI Revision: 0003 Serial No: Size: 2000.40GB <2000398934016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c4t25d0 Soft Errors: 0 Hard Errors: 252 Transport Errors: 0 Vendor: ATA Product: SAMSUNG HD203WI Revision: 0002 Serial No: Size: 2000.40GB <2000398934016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c4t30d0 Soft Errors: 0 Hard Errors: 252 Transport Errors: 0 Vendor: ATA Product: SAMSUNG HD203WI Revision: 0003 Serial No: Size: 2000.40GB <2000398934016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c4t31d0 Soft Errors: 0 Hard Errors: 252 Transport Errors: 0 Vendor: ATA Product: SAMSUNG HD203WI Revision: 0002 Serial No: Size: 2000.40GB <2000398934016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c4t17d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD20EADS-32S Revision: 0A01 Serial No: Size: 2000.40GB <2000398934016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c4t11d0 Soft Errors: 0 Hard Errors: 17 Transport Errors: 116 Vendor: ATA Product: WDC WD20EADS-32S Revision: 5G04 Serial No: Size: 2000.40GB <2000398934016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 17 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c4t23d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31500341AS