Hi all,

Today i notice that one of the ZFS based servers within my company is
complaining about disk errors, but i would like to know if this a real
physical error or something like a transport error or something.
The server in question runs snv_134 attached to 2 J4400 jbods , and the
head-node has 2 hba's and i've enabled multipath support. I've 1TB sata
enterprise class disks on the server.

The messages seen in the system are :

Jul 15 12:30:48 storage01 fmd: [ID 377184 daemon.error] SUNW-MSG-ID:
DISK-8000-4Q, TYPE: Fault, VER: 1, SEVERITY: Critical
Jul 15 12:30:48 storage01 EVENT-TIME: Thu Jul 15 12:30:48 CEST 2010
Jul 15 12:30:48 storage01 PLATFORM: PowerEdge-R710, CSN: HR9SG9J,
HOSTNAME: storage01
Jul 15 12:30:48 storage01 SOURCE: eft, REV: 1.16
Jul 15 12:30:48 storage01 EVENT-ID: 859b9d9c-1214-4302-8089-b9447619a2a1
Jul 15 12:30:48 storage01 DESC: The command was terminated with a
non-recovered error condition that may have been caused by a flaw in the
media or an error in the recorded data.
Jul 15 12:30:48 storage01   Refer to http://sun.com/msg/DISK-8000-4Q for
more information.
Jul 15 12:30:48 storage01 AUTO-RESPONSE: The device may be offlined or
degraded.
Jul 15 12:30:48 storage01 IMPACT: It is likely that continued operation
will result in data corruption, which may eventually cause the loss of
service or the service degradation.
Jul 15 12:30:48 storage01 REC-ACTION: Schedule a repair procedure to
replace the affected device. Use 'fmadm faulty' to find the affected disk.
Jul 15 12:30:48 storage01 genunix: [ID 846333 kern.warning] WARNING:
constraints forbid retire: /scsi_vhci/d...@g5000c50019f03af6

/usr/local/sbin/smartctl -xa -d scsi /dev/rdsk/c0t5000C50019F03AF6d0
smartctl 5.39.1 2010-01-28 r3054 [i386-pc-solaris2.11] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

Serial number: 9QJ60QFG
Device type: disk
Local Time is: Sat Jul 17 11:13:00 2010 CEST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK

Current Drive Temperature:     28 C

Error Counter logging not supported

[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
No self-tests have been logged
Long (extended) Self Test duration: 13800 seconds [230.0 minutes]
Device does not support Background scan results logging
scsiPrintSasPhy Log Sense Failed [unsupported field in scsi command]

iostat -En | grep c0t5000C50019F03AF6
c0t5000C50019F03AF6d0 Soft Errors: 0 Hard Errors: 50 Transport Errors: 0


iostat -en | grep c0t5000C50019F03AF6
 
---- errors ---
  s/w h/w trn tot device

    0  50   0  50 c0t5000C50019F03AF6d0


So i'm confused because S.M.A.R.T reports no errors, but i see that
iostat reports 50 hard-errors...
With this should i already start a disk replacement in the pool in
question and then get a RMA with the disk vendor?

Thanks for all your time,
Bruno

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to