Re: [zfs-discuss] strange 'too many errors' msg

2009-02-12 Thread Blake
I think you could try clearing the pool - however, consulting the
fault management tools (fmdump and it's kin) might be smart first.
It's possible this is an error in the controller.

The output of 'cfgadm' might be of use also.



On Wed, Feb 11, 2009 at 7:12 PM, Jens Elkner
 wrote:
> Hi,
>
> just found on a X4500 with S10u6:
>
> fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-GH, TYPE: Fault, VER: 1, 
> SEVERITY: Major
> EVENT-TIME: Wed Feb 11 16:03:26 CET 2009
> PLATFORM: Sun Fire X4500, CSN: 00:14:4F:20:E0:2C , HOSTNAME: peng
> SOURCE: zfs-diagnosis, REV: 1.0
> EVENT-ID: 74e6f0ec-b1e7-e49b-8d71-dc1c9b68ad2b
> DESC: The number of checksum errors associated with a ZFS device exceeded 
> acceptable levels.  Refer to http://sun.com/msg/ZFS-8000-GH for more 
> information.
> AUTO-RESPONSE: The device has been marked as degraded.  An attempt will be 
> made to activate a hot spare if available.
> IMPACT: Fault tolerance of the pool may be compromised.
> REC-ACTION: Run 'zpool status -x' and replace the bad device.
>
> zpool status -x
> ...
>  mirror  DEGRADED 0 0 0
>spare DEGRADED 0 0 0
>  c6t6d0  DEGRADED 0 0 0  too many errors
>  c4t0d0  ONLINE   0 0 0
>c7t6d0ONLINE   0 0 0
> ...
>spares
>  c4t0d0  INUSE currently in use
>  c4t4d0  AVAIL
>
> Strange thing is, that for more than 3 month there was no single error
> logged with any drive. IIRC, before u4 I've seen occasionaly a bad
> checksum error message, but this was obviously the result from the
> wellknown race condition of the marvell driver when havy writes took place.
>
> So I tend to interprete it as an false alarm and think about
> 'zpool ... clear c6t6d0'.
>
> What do you think. Is this a good idea?
>
> Regards,
> jel.
>
> BTW: zpool status -x  msg refers to http://www.sun.com/msg/ZFS-8000-9P,
> the event to http://sun.com/msg/ZFS-8000-GH - little bit
> inconsistent I think.
> --
> Otto-von-Guericke University http://www.cs.uni-magdeburg.de/
> Department of Computer Science   Geb. 29 R 027, Universitaetsplatz 2
> 39106 Magdeburg, Germany Tel: +49 391 67 12768
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] strange 'too many errors' msg

2009-02-11 Thread Jens Elkner
Hi,

just found on a X4500 with S10u6:

fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-GH, TYPE: Fault, VER: 1, 
SEVERITY: Major
EVENT-TIME: Wed Feb 11 16:03:26 CET 2009
PLATFORM: Sun Fire X4500, CSN: 00:14:4F:20:E0:2C , HOSTNAME: peng
SOURCE: zfs-diagnosis, REV: 1.0
EVENT-ID: 74e6f0ec-b1e7-e49b-8d71-dc1c9b68ad2b
DESC: The number of checksum errors associated with a ZFS device exceeded 
acceptable levels.  Refer to http://sun.com/msg/ZFS-8000-GH for more 
information.
AUTO-RESPONSE: The device has been marked as degraded.  An attempt will be made 
to activate a hot spare if available.
IMPACT: Fault tolerance of the pool may be compromised.
REC-ACTION: Run 'zpool status -x' and replace the bad device.

zpool status -x 
...
  mirror  DEGRADED 0 0 0
spare DEGRADED 0 0 0
  c6t6d0  DEGRADED 0 0 0  too many errors
  c4t0d0  ONLINE   0 0 0
c7t6d0ONLINE   0 0 0
...
spares
  c4t0d0  INUSE currently in use
  c4t4d0  AVAIL

Strange thing is, that for more than 3 month there was no single error
logged with any drive. IIRC, before u4 I've seen occasionaly a bad
checksum error message, but this was obviously the result from the
wellknown race condition of the marvell driver when havy writes took place.

So I tend to interprete it as an false alarm and think about
'zpool ... clear c6t6d0'.

What do you think. Is this a good idea?

Regards,
jel. 

BTW: zpool status -x  msg refers to http://www.sun.com/msg/ZFS-8000-9P,
 the event to http://sun.com/msg/ZFS-8000-GH - little bit
 inconsistent I think.
-- 
Otto-von-Guericke University http://www.cs.uni-magdeburg.de/
Department of Computer Science   Geb. 29 R 027, Universitaetsplatz 2
39106 Magdeburg, Germany Tel: +49 391 67 12768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss