more below...

MP wrote:
> On 03/10/2007, *Richard Elling* <[EMAIL PROTECTED] 
> <mailto:[EMAIL PROTECTED]>> wrote:
> 
>     Yes.  From the fine manual on zpool:
>           zpool replace [-f] pool old_device [new_device]
> 
>               Replaces old_device with new_device. This is  equivalent
>               to attaching new_device, waiting for it to resilver, and
>               then detaching old_device.
>     ...
>               If  new_device  is  not  specified,   it   defaults   to
>               old_device.  This form of replacement is useful after an
>               existing  disk  has  failed  and  has  been   physically
>               replaced.  In  this case, the new disk may have the same
>               /dev/dsk path as the old device, even though it is actu-
>               ally a different disk. ZFS recognizes this.
> 
>     For a stripe, you don't have redundancy, so you cannot replace the
>     disk with itself. 
> 
> 
> I don't see how a stripe makes a difference. It's just 2 drives joined 
> together logically to make a
> new device. It can be used by the system just like a normal hard drive.  Just 
> like a normal hard
> drive it too has no redundancy?

Correct.  It would be redundant if it were a mirror, raidz, or raidz2.  In the
case of stripes of mirrors, raidz, or raidz2 vdevs, they are redundant.

>     You would have to specify the [new_device]
>     I've submitted CR6612596 for a better error message and CR6612605
>     to mention this in the man page.
> 
> 
> Perhaps I was a little unclear. Zfs did a few things during this whole 
> escapade which seemed wrong.
> 
> # mdconfig -a -tswap -s64m
> md0
> # mdconfig -a -tswap -s64m
> md1
> # mdconfig -a -tswap -s64m
> md2

I presume you're not running Solaris, so please excuse me if I take a
Solaris view to this problem.

> # zpool create tank raidz md0 md1 md2
> # zpool status -v tank
>   pool: tank
>  state: ONLINE
>  scrub: none requested
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         tank        ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             md0     ONLINE       0     0     0
>             md1     ONLINE       0     0     0
>             md2     ONLINE       0     0     0
> 
> errors: No known data errors
> # zpool offline tank md0
> Bringing device md0 offline
> # dd if=/dev/zero of=/dev/md0 bs=1m
> dd: /dev/md0: end of device
> 65+0 records in
> 64+0 records out
> 67108864 bytes transferred in 0.044925 secs (1493798602 bytes/sec)
> # zpool status -v tank
>   pool: tank
>  state: DEGRADED
> status: One or more devices has been taken offline by the administrator.
>         Sufficient replicas exist for the pool to continue functioning in a
>         degraded state.
> action: Online the device using 'zpool online' or replace the device with
>         'zpool replace'.
>  scrub: none requested
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         tank        DEGRADED     0     0     0
>           raidz1    DEGRADED     0     0     0
>             md0     OFFLINE      0     0     0
>             md1     ONLINE       0     0     0
>             md2     ONLINE       0     0     0
> 
> errors: No known data errors
> 
> --------------------
> At this point where the drive is offline a 'zpool replace tank md0' will 
> fix the array.

Correct.  The pool is redundant.

> However, if instead the other advice given; 'zpool online tank md0' is 
> used then problems start to occur:
> --------------------
> 
> # zpool online tank md0
> # zpool status -v tank
>   pool: tank
>  state: ONLINE
> status: One or more devices could not be used because the label is 
> missing or
>         invalid.  Sufficient replicas exist for the pool to continue
>         functioning in a degraded state.
> action: Replace the device using 'zpool replace'.
>    see: http://www.sun.com/msg/ZFS-8000-4J
>  scrub: resilver completed with 0 errors on Wed Oct  3 18:44:22 2007
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         tank        ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             md0     UNAVAIL      0     0     0  corrupted data
>             md1     ONLINE       0     0     0
>             md2     ONLINE       0     0     0
> 
> errors: No known data errors
> 
> -------------
> ^^^^^^^
> Surely this is wrong? Zpool shows the pool as 'ONLINE'  and not 
> degraded. Whereas the status explanation
> says that it is degraded and 'zpool replace' is required. That's just 
> confusing.

I agree, I would expect the STATE to be DEGRADED.

> -------------
> 
> # zpool scrub tank
> # zpool status -v tank
>   pool: tank
>  state: ONLINE
> status: One or more devices could not be used because the label is 
> missing or
>         invalid.  Sufficient replicas exist for the pool to continue
>         functioning in a degraded state.
> action: Replace the device using 'zpool replace'.
>    see: http://www.sun.com/msg/ZFS-8000-4J
>  scrub: resilver completed with 0 errors on Wed Oct  3 18:45:06 2007
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         tank        ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             md0     UNAVAIL      0     0     0  corrupted data
>             md1     ONLINE       0     0     0
>             md2     ONLINE       0     0     0
> 
> errors: No known data errors
> # zpool replace tank md0
> invalid vdev specification
> use '-f' to override the following errors:
> md0 is in use (r1w1e1)
> # zpool replace -f tank md0
> invalid vdev specification
> the following errors must be manually repaired:
> md0 is in use (r1w1e1)
> 
> -----------------
> Well the advice of 'zpool replace' doesn't work. At this point the user 
> is now stuck. There seems to
> be just no way to now use the existing device md0.

In Solaris NV b72, this works as you expect.
# zpool replace zwimming /dev/ramdisk/rd1
# zpool status -v zwimming
   pool: zwimming
  state: DEGRADED
  scrub: resilver completed with 0 errors on Wed Oct  3 11:55:36 2007
config:

         NAME                        STATE     READ WRITE CKSUM
         zwimming                    DEGRADED     0     0     0
           raidz1                    DEGRADED     0     0     0
             replacing               DEGRADED     0     0     0
               /dev/ramdisk/rd1/old  FAULTED      0     0     0  corrupted data
               /dev/ramdisk/rd1      ONLINE       0     0     0
             /dev/ramdisk/rd2        ONLINE       0     0     0
             /dev/ramdisk/rd3        ONLINE       0     0     0

errors: No known data errors
# zpool status -v zwimming
   pool: zwimming
  state: ONLINE
  scrub: resilver completed with 0 errors on Wed Oct  3 11:55:36 2007
config:

         NAME                  STATE     READ WRITE CKSUM
         zwimming              ONLINE       0     0     0
           raidz1              ONLINE       0     0     0
             /dev/ramdisk/rd1  ONLINE       0     0     0
             /dev/ramdisk/rd2  ONLINE       0     0     0
             /dev/ramdisk/rd3  ONLINE       0     0     0

errors: No known data errors


> -----------------
> # mdconfig -a -tswap -s64m
> md3
> # zpool replace -f tank md0 md3
> # zpool status -v tank
>   pool: tank
>  state: ONLINE
>  scrub: resilver completed with 0 errors on Wed Oct  3 18:45:52 2007
> config:
> 
>         NAME           STATE     READ WRITE CKSUM
>         tank           ONLINE       0     0     0
>           raidz1       ONLINE       0     0     0
>             replacing  ONLINE       0     0     0
>               md0      UNAVAIL      0     0     0  corrupted data
>               md3      ONLINE       0     0     0
>             md1        ONLINE       0     0     0
>             md2        ONLINE       0     0     0
> 
> errors: No known data errors
> # zpool status -v tank
>   pool: tank
>  state: ONLINE
>  scrub: resilver completed with 0 errors on Wed Oct  3 18:45:52 2007
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         tank        ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             md3     ONLINE       0     0     0
>             md1     ONLINE       0     0     0
>             md2     ONLINE       0     0     0
> 
> errors: No known data errors
> 
> --------------------
> 
> Only changing the device name of the failed component can get zfs to 
> rebuild the array. That seems
> wrong to me.
> 
> 1. Why does zpool status say 'ONLINE' when the pool is obviously degraded?

IMHO, bug.

> 2. Why is the 1st advice given 'zpool online', which does not work?

In Solaris I see:
# zpool online zwimming /dev/ramdisk/rd1
warning: device '/dev/ramdisk/rd1' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present

> 3. Why is the 2nd advice given 'zpool replace', when that doesn't work 
> after the 1st advice has been performed?

Works in Solaris.  Hopefully it is in the pipeline for *BSD.

> 4. Why do I have to use a device with a different name to get this to 
> work? Surely
>     what I did above mimics exactly what happens when a drive fails, and 
> the manual
>     says that 'zpool replace <pool> <failed-device>' will fix it?

In such cases, I would not try this while online, I would have offlined
it before attempting replace.  But I see your point, it is confusing.
But given that Solaris seems to handle this differently, I think it is
just a matter of your release catching up.

> 5. If zfs can access all the necessary devices in the pool, then why 
> doesn't scrub fix the array?

You destroyed all of the data on the device, including the uberblocks.
AFAIK, scrub does not attempt to recreate uberblocks, which is why the
replace command exists.  I think you've identified a user interface
problem that can be corrected more automatically.  What do others think?
Should a scrub perform a replace if the uberblocks are nonexistent?
  -- richard
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to