Re: [zfs-discuss] Hard Errors on HDDs

2011-01-14 Thread solaris
Richard Elling  writes:

> In my experience, this looks like a set of devices sitting behind an
> expander. I have seen one bad disk take out all disks sitting behind
> an expander.  I have also seen bad disk firmware take out all disks
> behind an expander.  I once saw a bad cable take out everything.
>  -- richard

In my experience i ve also seen the same problems.
a lot of sata disks (seagate barracuda ES.2 and other) all behind
expanders (supermicro sc847 chassis)

the issue were solved after we removed all sata disks behind our
expander and replaced them with Enterprise SAS Disks.

thereafter we only faced this problems when an connected sata-ssd died.
so we also moved our sata-ssds away from this backplane and connected
them directly to  the 1068 based controller.

the problem arrised, after we moved a identically server to a expander
backplane (to get more drives connected).
before this discs were running for months without any problems *direct*
attached.

regards
daniel

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Hard Errors on HDDs

2011-01-13 Thread Richard Elling
"hard errors" are a generic classification.  fmdump -eV shows the 
sense/asc/ascq, which
is generally more useful for diagnosis.  More below...


On Jan 1, 2011, at 7:50 AM, Benji wrote:

> Hi,
> 
> I recently noticed that there are a lot of Hard Errors on multiple drives 
> that's being reported by iostat. Also, dmesg reports various messages from 
> the mpt driver.
> 
> My config is:
> MB: SUPERMICRO X8SIL-F
> HBA: AOC-USAS-L8i (LSI 1068)
> RAM: 4GB ECC
> SunOS SAN 5.11 snv_134 i86pc i386 i86pc Solaris
> 
> My configuration is a striped mirrored vdev of 13 drives (one mirror had an 
> error on a drive, which I cleared. But just to be safe I added another drive 
> to the mirror):
> 
> NAME STATE READ WRITE CKSUM
>zpoolONLINE   0 0 0
>  mirror-0   ONLINE   0 0 0
>c4t13d0  ONLINE   0 0 0
>c4t19d0  ONLINE   0 0 0
>  mirror-1   ONLINE   0 0 0
>c4t25d0  ONLINE   0 0 0
>c4t31d0  ONLINE   0 0 0
>  mirror-2   ONLINE   0 0 0
>c4t12d0  ONLINE   0 0 0
>c4t18d0  ONLINE   0 0 0
>  mirror-3   ONLINE   0 0 0
>c4t24d0  ONLINE   0 0 0
>c4t30d0  ONLINE   0 0 0
>  mirror-4   ONLINE   0 0 0
>c4t11d0  ONLINE   0 0 0
>c4t17d0  ONLINE   0 0 0
>c4t10d0  ONLINE   0 0 0
>  mirror-5   ONLINE   0 0 0
>c4t23d0  ONLINE   0 0 0
>c4t29d0  ONLINE   0 0 0
> 
> 
> Here's the output from iostat -En:
> 
> c6d1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
> Model: WDC WD3200BEKT- Revision:  Serial No:  WD-WXR1A30 Size: 320.07GB 
> <320070352896 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0
> c7d1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
> Model: WDC WD3200BEKT- Revision:  Serial No:  WD-WXR1A30 Size: 320.07GB 
> <320070352896 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0
> c4t12d0  Soft Errors: 0 Hard Errors: 252 Transport Errors: 0
> Vendor: ATA  Product: SAMSUNG HD203WI  Revision: 0003 Serial No:
> Size: 2000.40GB <2000398934016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0 Predictive Failure Analysis: 0
> c4t13d0  Soft Errors: 0 Hard Errors: 252 Transport Errors: 0
> Vendor: ATA  Product: SAMSUNG HD203WI  Revision: 0002 Serial No:
> Size: 2000.40GB <2000398934016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0 Predictive Failure Analysis: 0
> c4t18d0  Soft Errors: 0 Hard Errors: 252 Transport Errors: 0
> Vendor: ATA  Product: SAMSUNG HD203WI  Revision: 0003 Serial No:
> Size: 2000.40GB <2000398934016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0 Predictive Failure Analysis: 0
> c4t19d0  Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
> Vendor: ATA  Product: SAMSUNG HD203WI  Revision: 0002 Serial No:
> Size: 2000.40GB <2000398934016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0 Predictive Failure Analysis: 0
> c4t24d0  Soft Errors: 0 Hard Errors: 252 Transport Errors: 0
> Vendor: ATA  Product: SAMSUNG HD203WI  Revision: 0003 Serial No:
> Size: 2000.40GB <2000398934016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0 Predictive Failure Analysis: 0
> c4t25d0  Soft Errors: 0 Hard Errors: 252 Transport Errors: 0
> Vendor: ATA  Product: SAMSUNG HD203WI  Revision: 0002 Serial No:
> Size: 2000.40GB <2000398934016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0 Predictive Failure Analysis: 0
> c4t30d0  Soft Errors: 0 Hard Errors: 252 Transport Errors: 0
> Vendor: ATA  Product: SAMSUNG HD203WI  Revision: 0003 Serial No:
> Size: 2000.40GB <2000398934016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0 Predictive Failure Analysis: 0
> c4t31d0  Soft Errors: 0 Hard Errors: 252 Transport Errors: 0
> Vendor: ATA  Product: SAMSUNG HD203WI  Revision: 0002 Serial No:
> Size: 2000.40GB <2000398934016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0 Predictive Failure Analysis: 0
> c4t17d0  Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
> Vendor: ATA  Product: WDC WD20EADS-32S Revision: 0A01 Serial No:
> Size: 2000.40GB <2000398934016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> Illegal Request: 0 Predictive Failure Analysis: 0
> c4t11d0  Soft Errors: 0 Har

Re: [zfs-discuss] Hard Errors on HDDs

2011-01-06 Thread Benji
For anyone that is interested, here's a progress report.

I created a new pool with only one mirror vdev of 2 disks, namely with the new 
SAMSUNG HD204UI. These drives, along with the older HD203WI, use Advanced 
Format Technology (e.g. 4K sectors). Only these drives had hard errors in my 
pool, as opposed the the old Seagates and WDs. 

To create the new pool, I recompiled the zpool cmd to give the value of ashift 
12 so that the new pool has an alignement of 4K instead of 512 bytes (see here 
: 
http://www.solarismen.de/archives/5-Solaris-and-the-new-4K-Sector-Disks-e.g.-WDxxEARS-Part-2.html).

So I filled this new 4K aligned pool with 1.5TB of data, scrubbed it and no 
errors. I checked the log and no hard errors either. Usually after a scrub I 
get some hard errors.

Maybe the pool needs to have more vdevs in it to really stress the HBA and 
produce hard errors, but it's a strange coincidence nonetheless that only the 
4K drives had errors and then when used in a 4K aligned pool, no more errors.

I'll probably re-create my original pool with only 4K drives in a 4K aligned 
pool and see what happens.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Hard Errors on HDDs

2011-01-03 Thread Benji
Thanks for the input!

I am using an Ipass to Ipass cable that connects my HBA to my backplane. It was 
firmly locked into both connectors.

I offlined 2 supposedly faulty SAMSUNG drives, scanned their whole surface 
using estools and it did not report any errors. 

I'm starting to think that it may be an issue with the mpt driver and the HBA 
card. Anyone else using an LSI 1068E based HBA card and having issues?

Thanks
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Hard Errors on HDDs

2011-01-03 Thread Orvar Korvar
Maybe a cable is loose? Reinsert all the cables into all drives? And the 
controller card? 

Yes, ZFS detects such problems.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Hard Errors on HDDs

2011-01-03 Thread Benji
Hi,

I recently noticed that there are a lot of Hard Errors on multiple drives 
that's being reported by iostat. Also, dmesg reports various messages from the 
mpt driver.

My config is:
MB: SUPERMICRO X8SIL-F
HBA: AOC-USAS-L8i (LSI 1068)
RAM: 4GB ECC
SunOS SAN 5.11 snv_134 i86pc i386 i86pc Solaris

My configuration is a striped mirrored vdev of 13 drives (one mirror had an 
error on a drive, which I cleared. But just to be safe I added another drive to 
the mirror):

 NAME STATE READ WRITE CKSUM
zpoolONLINE   0 0 0
  mirror-0   ONLINE   0 0 0
c4t13d0  ONLINE   0 0 0
c4t19d0  ONLINE   0 0 0
  mirror-1   ONLINE   0 0 0
c4t25d0  ONLINE   0 0 0
c4t31d0  ONLINE   0 0 0
  mirror-2   ONLINE   0 0 0
c4t12d0  ONLINE   0 0 0
c4t18d0  ONLINE   0 0 0
  mirror-3   ONLINE   0 0 0
c4t24d0  ONLINE   0 0 0
c4t30d0  ONLINE   0 0 0
  mirror-4   ONLINE   0 0 0
c4t11d0  ONLINE   0 0 0
c4t17d0  ONLINE   0 0 0
c4t10d0  ONLINE   0 0 0
  mirror-5   ONLINE   0 0 0
c4t23d0  ONLINE   0 0 0
c4t29d0  ONLINE   0 0 0


Here's the output from iostat -En:

c6d1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Model: WDC WD3200BEKT- Revision:  Serial No:  WD-WXR1A30 Size: 320.07GB 
<320070352896 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0
c7d1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Model: WDC WD3200BEKT- Revision:  Serial No:  WD-WXR1A30 Size: 320.07GB 
<320070352896 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0
c4t12d0  Soft Errors: 0 Hard Errors: 252 Transport Errors: 0
Vendor: ATA  Product: SAMSUNG HD203WI  Revision: 0003 Serial No:
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c4t13d0  Soft Errors: 0 Hard Errors: 252 Transport Errors: 0
Vendor: ATA  Product: SAMSUNG HD203WI  Revision: 0002 Serial No:
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c4t18d0  Soft Errors: 0 Hard Errors: 252 Transport Errors: 0
Vendor: ATA  Product: SAMSUNG HD203WI  Revision: 0003 Serial No:
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c4t19d0  Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: SAMSUNG HD203WI  Revision: 0002 Serial No:
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c4t24d0  Soft Errors: 0 Hard Errors: 252 Transport Errors: 0
Vendor: ATA  Product: SAMSUNG HD203WI  Revision: 0003 Serial No:
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c4t25d0  Soft Errors: 0 Hard Errors: 252 Transport Errors: 0
Vendor: ATA  Product: SAMSUNG HD203WI  Revision: 0002 Serial No:
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c4t30d0  Soft Errors: 0 Hard Errors: 252 Transport Errors: 0
Vendor: ATA  Product: SAMSUNG HD203WI  Revision: 0003 Serial No:
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c4t31d0  Soft Errors: 0 Hard Errors: 252 Transport Errors: 0
Vendor: ATA  Product: SAMSUNG HD203WI  Revision: 0002 Serial No:
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c4t17d0  Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: WDC WD20EADS-32S Revision: 0A01 Serial No:
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c4t11d0  Soft Errors: 0 Hard Errors: 17 Transport Errors: 116
Vendor: ATA  Product: WDC WD20EADS-32S Revision: 5G04 Serial No:
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 17 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c4t23d0  Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: ST31500341AS