I had similar problems replacing a drive myself, it's not intuitive exactly 
which ZFS commands you need to issue to recover from a drive failure.  

I think your problems stemmed from using -f.  Generally if you have to use 
that, there's a step or option you've missed somewhere.

However I'm not 100% sure what command you should have used instead.  Things 
I've tried in the past include:
# zpool replace test c2t2d0 c2t2d0
# zpool online test c2t2d0
# zpool replace test c2t2d0

I know I did a whole load of testing various options to work out how to replace 
a drive in a test machine.  I'm looking to see if I have any iSCSI notes 
around, but from memory when I tested iSCSI I was also testing ZFS on a 
cluster, so my solution was to simply get the iSCSI devices working on the 
offline node, then failover ZFS.

It only took 2-3 seconds to failover ZFS to the other node, and I suspect I 
used that solution because I couldn't work out how to get ZFS to correctly 
bring faulted iSCSI devices back online.

However, in case it helps, I do have the whole process for physical disks on a 
Sun x4500 documented:

# zpool offline splash c5t7d0

Now, find the controller in use for this device:
# cfgadm | grep c5t7d0
sata3/7::dsk/c5t7d0            disk         connected    configured   ok

And offline it with:
# cfgadm -c unconfigure sata3/7

Verify that it is now offline with:
# cfgadm | grep sata3/7
sata3/7                        disk         connected    unconfigured ok

Now remove and replace the disk.

Bring the disk online and check it's status with:
# cfgadm -c configure sata3/7 
# cfgadm | grep sata3/7 
sata3/7::dsk/c5t7d0            disk         connected    configured   ok

Bring the disk back into the zfs pool.  You will get a warning:
# zpool online splash c5t7d0
warning: device 'c5t7d0' onlined, but remains in faulted state

use 'zpool replace' to replace devices that are no longer present
# zpool replace splash c5t7d0

you will now see zpool status report that a resilver is in process, with detail 
as follows:
          raidz2            DEGRADED     0     0     0
            spare           DEGRADED     0     0     0
              replacing     DEGRADED     0     0     0
                c5t7d0s0/o  UNAVAIL      0     0     0  corrupted data
                c5t7d0      ONLINE       0     0     0

Once the resilver finishes, run zpool status again and it should appear fine.

Note:   I sometimes had to run zpool status twice to get an up to date status 
of the devices.
This message posted from opensolaris.org
zfs-discuss mailing list

Reply via email to