Re: [zfs-discuss] How does resilver/scrub work?

Richard Elling Wed, 23 May 2012 09:56:23 -0700

comments far below...

On May 22, 2012, at 1:42 AM, Jim Klimov wrote:


> 2012-05-22 7:30, Daniel Carosone wrote:
>> On Mon, May 21, 2012 at 09:18:03PM -0500, Bob Friesenhahn wrote:
>>> On Mon, 21 May 2012, Jim Klimov wrote:
>>>> This is so far a relatively raw idea and I've probably missed
>>>> something. Do you think it is worth pursuing and asking some
>>>> zfs developers to make a POC? ;)
>>> 
>>> I did read all of your text. :-)
>>> 
>>> This is an interesting idea and could be of some use but it would be
>>> wise to test it first a few times before suggesting it as a general
>>> course.
>> 
>> I've done basically this kind of thing before: dd a disk and then
>> scrub rather than replace, treating errors as expected.
> 
> I got into similar situation last night on that Thumper -
> it is now migrating a flaky source disk in the array from
> an original old 250Gb disk into a same-sized partition on
> the new 3Tb drive (as I outlined as IDEA7 in another thread).
> The source disk itself had about 300 CKSUM errors during
> the process, and for reasons beyond my current understanding,
> the resilver never completed.
> 
> In zpool status it said that the process was done several
> hours before the time I looked at it, but the TLVDEV still
> had a "spare" component device comprised of the old disk
> and new partition, and the (same) hotspare device in the
> pool was "INUSE".
> 
> After a while we just detached the old disk from the pool
> and ran scrub, which first found some 178 CKSUM errors on
> the new partition right away, and degraded the TLVDEV and
> pool.
> 
> We cleared the errors, and ran the script below to log
> the detected errors and clear them, so the disk is fixed
> and not kicked out of the pool due to mismatches.
> Overall 1277 errors were logged and apparently fixed, and
> the pool is now on its second full scrub run - no bugs so
> far (knocking wood; certainly none this early in the scrub
> as we had last time).
> 
> So in effect, this methodology works for two of us :)
> 
> Since you did similar stuff already, I have a few questions:
> 1) How/what did you DD? The whole slice with the zfs vdev?

dd, or simular dumb block copiers, should work fine. However, they 
are inefficient and operationally difficult to manage, which is why they
tend to fall in the prefer-to-use-something-else catagory.

>   Did the system complain (much) about the renaming of the
>   device compared to paths embedded in pool/vdev headers?

It shouldn't unless you did something to confuse it, such as having both
the original and the dd copy online at the same time. In that case, you
will have two different copies of the same identified device that are
independent. This is an operational mistake, hence my comment above.

>   Did you do anything manually to remedy that (forcing
>   import, DDing some handcrafted uberblocks, anything?)

Not needed.

> 
> 2) How did you "treat errors as expected" during scrub?
>   As I've discovered, there were hoops to jump through.
>   Is there a switch to disable "degrading" of pools and
>   TLVDEVs based on only the CKSUM counts?

DEGRADED is the status. You clear degraded states by fixing the problem
and running zpool clear. DEGRADED, in and of itself, is not a problem.

> 
> 
> My raw hoop-jumping script:
> -----
> 
> #!/bin/bash
> 
> # /root/scrubwatch.sh
> # Watches 'pond' scrub and resets errors to avoid auto-degrading
> # the device, but logs the detected error counts however.
> # See also "fmstat|grep zfs-diag" for precise counts.
> # See also https://blogs.oracle.com/bobn/entry/zfs_and_fma_two_great
> #          for details on FMA and fmstat with zfs hotspares
> 
> while true; do
>    zpool status pond | gegrep -A4 -B3 'resilv|error|c1t2d|c5t6d|%'
>    date
>    echo ""
> 
>    C1="`zpool status pond | grep c1t2d`"
>    C2="`echo "$C1" | grep 'c1t2d0s1  ONLINE       0     0     0'`"
>    if [ x"$C2" = x ]; then
>        echo "`date`: $C1" >> /var/tmp/zpool-clear_pond.log
>        zpool clear pond
>        zpool status pond | gegrep -A4 -B3 'resilv|error|c1t2d|c5t6d|%'
>        date
>    fi
>    echo ""
> 
>    sleep 60
> done

I would never allow such scripts in my site. It is important to track the 
progress and state changes. This script resets those counters for no
good reason.

I post this comment in the hope that future searches will not encourage 
people to try such things.
 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How does resilver/scrub work?

Reply via email to