Hi all, Today i notice that one of the ZFS based servers within my company is complaining about disk errors, but i would like to know if this a real physical error or something like a transport error or something. The server in question runs snv_134 attached to 2 J4400 jbods , and the head-node has 2 hba's and i've enabled multipath support. I've 1TB sata enterprise class disks on the server.
The messages seen in the system are : Jul 15 12:30:48 storage01 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: DISK-8000-4Q, TYPE: Fault, VER: 1, SEVERITY: Critical Jul 15 12:30:48 storage01 EVENT-TIME: Thu Jul 15 12:30:48 CEST 2010 Jul 15 12:30:48 storage01 PLATFORM: PowerEdge-R710, CSN: HR9SG9J, HOSTNAME: storage01 Jul 15 12:30:48 storage01 SOURCE: eft, REV: 1.16 Jul 15 12:30:48 storage01 EVENT-ID: 859b9d9c-1214-4302-8089-b9447619a2a1 Jul 15 12:30:48 storage01 DESC: The command was terminated with a non-recovered error condition that may have been caused by a flaw in the media or an error in the recorded data. Jul 15 12:30:48 storage01 Refer to http://sun.com/msg/DISK-8000-4Q for more information. Jul 15 12:30:48 storage01 AUTO-RESPONSE: The device may be offlined or degraded. Jul 15 12:30:48 storage01 IMPACT: It is likely that continued operation will result in data corruption, which may eventually cause the loss of service or the service degradation. Jul 15 12:30:48 storage01 REC-ACTION: Schedule a repair procedure to replace the affected device. Use 'fmadm faulty' to find the affected disk. Jul 15 12:30:48 storage01 genunix: [ID 846333 kern.warning] WARNING: constraints forbid retire: /scsi_vhci/d...@g5000c50019f03af6 /usr/local/sbin/smartctl -xa -d scsi /dev/rdsk/c0t5000C50019F03AF6d0 smartctl 5.39.1 2010-01-28 r3054 [i386-pc-solaris2.11] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net Serial number: 9QJ60QFG Device type: disk Local Time is: Sat Jul 17 11:13:00 2010 CEST Device supports SMART and is Enabled Temperature Warning Disabled or Not Supported SMART Health Status: OK Current Drive Temperature: 28 C Error Counter logging not supported [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on'] No self-tests have been logged Long (extended) Self Test duration: 13800 seconds [230.0 minutes] Device does not support Background scan results logging scsiPrintSasPhy Log Sense Failed [unsupported field in scsi command] iostat -En | grep c0t5000C50019F03AF6 c0t5000C50019F03AF6d0 Soft Errors: 0 Hard Errors: 50 Transport Errors: 0 iostat -en | grep c0t5000C50019F03AF6 ---- errors --- s/w h/w trn tot device 0 50 0 50 c0t5000C50019F03AF6d0 So i'm confused because S.M.A.R.T reports no errors, but i see that iostat reports 50 hard-errors... With this should i already start a disk replacement in the pool in question and then get a RMA with the disk vendor? Thanks for all your time, Bruno -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss