The endless resilver problem still persists on OI b147. Restarts when it
should complete.

I see no other solution than to copy the data to safety and recreate the
array. Any hints would be appreciated as that takes days unless i can stop
or pause the resilvering.

On Mon, Sep 27, 2010 at 1:13 PM, Tuomas Leikola <tuomas.leik...@gmail.com>wrote:

> Hi!
>
> My home server had some disk outages due to flaky cabling and whatnot, and
> started resilvering to a spare disk. During this another disk or two
> dropped, and were reinserted into the array. So no devices were actually
> lost, they just were intermittently away for a while each.
>
> The situation is currently as follows:
>   pool: tank
>  state: ONLINE
> status: One or more devices has experienced an unrecoverable error.  An
>         attempt was made to correct the error.  Applications are
> unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
>         using 'zpool clear' or replace the device with 'zpool replace'.
>    see: http://www.sun.com/msg/ZFS-8000-9P
>  scrub: resilver in progress for 5h33m, 22.47% done, 19h10m to go
> config:
>
>         NAME                       STATE     READ WRITE CKSUM
>         tank                       ONLINE       0     0     0
>           raidz1-0                 ONLINE       0     0     0
>             c11t1d0p0              ONLINE       0     0     0
>             c11t2d0                ONLINE       0     0     5
>             c11t6d0p0              ONLINE       0     0     0
>             spare-3                ONLINE       0     0     0
>               c11t3d0p0            ONLINE       0     0     0  106M
> resilvered
>               c9d1                 ONLINE       0     0     0  104G
> resilvered
>             c11t4d0p0              ONLINE       0     0     0
>             c11t0d0p0              ONLINE       0     0     0
>             c11t5d0p0              ONLINE       0     0     0
>             c11t7d0p0              ONLINE       0     0     0  93.6G
> resilvered
>           raidz1-2                 ONLINE       0     0     0
>             c6t2d0                 ONLINE       0     0     0
>             c6t3d0                 ONLINE       0     0     0
>             c6t4d0                 ONLINE       0     0     0  2.50K
> resilvered
>             c6t5d0                 ONLINE       0     0     0
>             c6t6d0                 ONLINE       0     0     0
>             c6t7d0                 ONLINE       0     0     0
>             c6t1d0                 ONLINE       0     0     1
>         logs
>           /dev/zvol/dsk/rpool/log  ONLINE       0     0     0
>         cache
>           c6t0d0p0                 ONLINE       0     0     0
>         spares
>           c9d1                     INUSE     currently in use
>
> errors: No known data errors
>
> And this has been going on for a week now, always restarting when it should
> complete.
>
> The questions in my mind atm:
>
> 1. How can i determine the cause for each resilver? Is there a log?
>
> 2. Why does it resilver the same data over and over, and not just the
> changed bits?
>
> 3. Can i force remove c9d1 as it is no longer needed but c11t3 can be
> resilvered instead?
>
> I'm running opensolaris 134, but the event originally happened on 111b. I
> upgraded and tried quiescing snapshots and IO, none of which helped.
>
> I've already ordered some new hardware to recreate this entire array as
> raidz2 among other things, but there's about a week of time when I can run
> debuggers and traces if instructed to.
>
> - Tuomas
>
>
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to