On Thu, 22 Jan 2009, Scott L. Burson wrote:

> This is in snv_86.  I have a four-drive raidz pool.  One of the drives 
> died.  I replaced it, but wasn't careful to put the new drive on the 
> same controller port; one of the existing drives wound up on the port 
> that had previously been used by the failed drive, and the new drive 
> wound up on the port previously used by that drive.
>
> I powered up and booted, and ZFS started a resilver automatically, but 
> the pool status was confused.  It looked like this, even after the 
> resilver completed (indentation is being discarded here):
>
>        NAME        STATE     READ WRITE CKSUM
>        pool0       DEGRADED     0     0     0
>          raidz1    DEGRADED     0     0     0
>            c5t1d0  FAULTED      0     0     0  too many errors
>            c5t1d0  ONLINE       0     0     0
>            c5t3d0  ONLINE       0     0     0
>            c5t0d0  ONLINE       0     0     0
>
> Doing 'zpool clear' just changed the "too many errors" to "corrupted 
> data".
>
> I then tried 'zpool replace pool0 c5t1d0 c5t2d0' to see if that would 
> straighten things out (hoping it wouldn't screw things up any further!). 
> It started another resilver, during which the status looked like this:
>
>        NAME           STATE     READ WRITE CKSUM
>        pool0          DEGRADED     0     0     0
>          raidz1       DEGRADED     0     0     0
>            replacing  DEGRADED     0     0     0
>              c5t1d0   FAULTED      0     0     0  corrupted data
>              c5t2d0   ONLINE       0     0     0
>            c5t1d0     ONLINE       0     0     0
>            c5t3d0     ONLINE       0     0     0
>            c5t0d0     ONLINE       0     0     0
>
> Maybe this will work, but -- doesn't ZFS put unique IDs on the drives so 
> it can track them in case they wind up on different ports?  If so, seems 
> like it needs to back-map that information to the device names when 
> mounting.  Or something :)

Did your resilver complete successfully? I had a similar problem and the 
array was showing thousands of write errors to the missing drive and the 
resilver of the new drive would basically reset after a few minutes. And 
you can't cancel a replacement for a nonexistent drive in this case. I 
ended up having to create a sparse device labeled with the proper guid and 
finally could remove one of the devices and properly initiate a 
replacement. Similar to bug id #6782540...

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to