I've got ZFS running on Solaris s10x_u3wos_10 X86 on a v40z, which has 
two PCI SCSI controllers, each connected to it's own external HP 
Diskarray (MSA30) with 7 disks + hot spare.

Both controllers are:
  LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI

The disks are a mix of:
  COMPAQ-BD3008A4C6-HPB4-279.40GB
  COMPAQ-BD30089BBA-HPB1-279.40GB
  COMPAQ-BD3008856C-HPB2-279.40GB

For the past few months, we've had behavior from the zfs which we 
wouldn't expect. We've had previous issues where we've seen a particular 
disk's service time through the roof (while other disks in the same pool 
were idle) and had to reboot due to the pool locking.

The most recent issue happened today, where the zfs pool locked up and 
we couldnt do anything about it besides reboot the system. we couldnt 
zpool status, we couldnt df -k, all commands related to IO just seemed 
to lock.  When the system came back up, zfs is showing one of the disks 
as UNAVAIL.


# zpool status
   pool: dbzpool
  state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas 
exist for
         the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
    see: http://www.sun.com/msg/ZFS-8000-D3
  scrub: resilver completed with 0 errors on Mon Feb  4 17:16:39 2008
config:

         NAME        STATE     READ WRITE CKSUM
         dbzpool     DEGRADED     0     0     0
           mirror    ONLINE       0     0     0
             c2t0d0  ONLINE       0     0     0
             c3t0d0  ONLINE       0     0     0
           mirror    ONLINE       0     0     0
             c2t1d0  ONLINE       0     0     0
             c3t1d0  ONLINE       0     0     0
           mirror    ONLINE       0     0     0
             c2t2d0  ONLINE       0     0     0
             c3t2d0  ONLINE       0     0     0
           mirror    ONLINE       0     0     0
             c2t3d0  ONLINE       0     0     0
             c3t3d0  ONLINE       0     0     0
           mirror    ONLINE       0     0     0
             c2t4d0  ONLINE       0     0     0
             c3t4d0  ONLINE       0     0     0
           mirror    ONLINE       0     0     0
             c2t5d0  ONLINE       0     0     0
             c3t5d0  ONLINE       0     0     0
           mirror    DEGRADED     0     0     0
             c2t8d0  ONLINE       0     0     0
             c3t8d0  UNAVAIL      0     0     0  cannot open
         spares
           c2t15d0   AVAIL
           c3t15d0   AVAIL

errors: No known data errors


I've tried:
# zpool offline dbzpool c3t8d0
cannot offline c3t8d0: no valid replicas
# zpool replace dbzpool c3t8d0
cannot replace c3t8d0 with c3t8d0: c3t8d0 is busy
# zpool online dbzpool c3t8d0
Bringing device c3t8d0 online

Note that even through the last command seems fruitful, the disks status 
remains UNAVAIL.

I've also tried writing to the disk directly - both before and after the 
above zpool commands.
# dd if=/dev/zero of=/dev/rdsk/c3t8d0s0 bs=1024 count=1048576
1048576+0 records in
1048576+0 records out


# smartctl -H /dev/rdsk/c3t8d0s0
smartctl version 5.37 [i386-pc-solaris2.10] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

SMART Health Status: OK

# iostat -nx 5 2 | grep c3t8
     0.5  203.7    6.0 1648.2  0.0  0.0    0.0    0.1   0   3 c3t8d0
     0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c3t8d0


After all this data, my questions are as follows:
1. What do I have to do (short of replacing the seemingly good disk) to 
get c3t8d0 back online?

2. Is there an alternative to the seemingly necessary reboot when the 
zfs pool locks?

3. Is the pool locking due to a possible problem in u3 that is addressed 
in u4 and beyond ?

-- 

Jeremy Kister
http://jeremy.kister.net./
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to