Re: [zfs-discuss] replacing a device with itself doesn't work
On Wed, Oct 03, 2007 at 10:02:03PM +0200, Pawel Jakub Dawidek wrote: On Wed, Oct 03, 2007 at 12:10:19PM -0700, Richard Elling wrote: - # zpool scrub tank # zpool status -v tank pool: tank state: ONLINE status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-4J scrub: resilver completed with 0 errors on Wed Oct 3 18:45:06 2007 config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 raidz1ONLINE 0 0 0 md0 UNAVAIL 0 0 0 corrupted data md1 ONLINE 0 0 0 md2 ONLINE 0 0 0 errors: No known data errors # zpool replace tank md0 invalid vdev specification use '-f' to override the following errors: md0 is in use (r1w1e1) # zpool replace -f tank md0 invalid vdev specification the following errors must be manually repaired: md0 is in use (r1w1e1) - Well the advice of 'zpool replace' doesn't work. At this point the user is now stuck. There seems to be just no way to now use the existing device md0. In Solaris NV b72, this works as you expect. # zpool replace zwimming /dev/ramdisk/rd1 # zpool status -v zwimming pool: zwimming state: DEGRADED scrub: resilver completed with 0 errors on Wed Oct 3 11:55:36 2007 config: NAMESTATE READ WRITE CKSUM zwimmingDEGRADED 0 0 0 raidz1DEGRADED 0 0 0 replacing DEGRADED 0 0 0 /dev/ramdisk/rd1/old FAULTED 0 0 0 corrupted data /dev/ramdisk/rd1 ONLINE 0 0 0 /dev/ramdisk/rd2ONLINE 0 0 0 /dev/ramdisk/rd3ONLINE 0 0 0 errors: No known data errors # zpool status -v zwimming pool: zwimming state: ONLINE scrub: resilver completed with 0 errors on Wed Oct 3 11:55:36 2007 config: NAME STATE READ WRITE CKSUM zwimming ONLINE 0 0 0 raidz1 ONLINE 0 0 0 /dev/ramdisk/rd1 ONLINE 0 0 0 /dev/ramdisk/rd2 ONLINE 0 0 0 /dev/ramdisk/rd3 ONLINE 0 0 0 errors: No known data errors Good to know, but I think it's still a bit of ZFS fault. The error message 'md0 is in use (r1w1e1)' means that something (I'm quite sure it's ZFS) keeps device open. Why does it keeps it open when it doesn't recognize it? Or maybe it tries to open it twice for write (exclusively) when replacing, which is not allowed in GEOM in FreeBSD. I can take a look if this is the former or the latter, but it should be fixed in ZFS itself, IMHO. Ok, it seems that it was fixed in ZFS itself already: /* * If we are setting the vdev state to anything but an open state, then * always close the underlying device. Otherwise, we keep accessible * but invalid devices open forever. We don't call vdev_close() itself, * because that implies some extra checks (offline, etc) that we don't * want here. This is limited to leaf devices, because otherwise * closing the device will affect other children. */ if (vdev_is_dead(vd) vd-vdev_ops-vdev_op_leaf) vd-vdev_ops-vdev_op_close(vd); The ZFS version from FreeBSD-CURRENT doesn't have this code yet, it's only in my perforce branch for now. I'll verify later today if it really fixes the problem and I'll report back if not. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpqzqbHn0DZG.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replacing a device with itself doesn't work
Pawel, Is this a problem with ZFS trying to open the device twice? Richard, Yes a scrub should fix the device. One of zfs' faetures is ease of administration. It seems to defy logic that a scrub does not fix all devices, if possible. Why make it any harder for the admin? Cheers. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] replacing a device with itself doesn't work
Hi, I hope someone can help cos ATM zfs' logic seems a little askew. I just swapped a failing 200gb drive that was one half of a 400gb gstripe device which I was using as one of the devices in a 3 device raidz1. When the OS came back up after the drive had been changed, the necessary metadata was of course not on the new drive so the stripe didn't exist. Zfs understandably complained it couldn't open the stripe, however it did not show the array as degraded. I didn't save the output, but it was just like described in this thread: http://www.nabble.com/Shooting-yourself-in-the-foot-with-ZFS:-is-quite-easy-t4512790.html I recreated the gstripe device under the same name stripe/str1 and assumed I could just: # zpool replace pool stripe/str1 invalid vdev specification stripe/str1 is in use (r1w1e1) It also told me to try -f, which I did, but was greeted with the same error. Why can I not replace a device with itself? As the man page describes just this procedure I'm a little confused. Try as I might (online, offline, scrub) I could not get the array to rebuild, just like was the guy described in that thread above. I eventually resorted to recreating the stripe with a different name stripe/str2. I could then perform a: # zpool replace pool stripe/str1 stripe/str2 Is there a reason I have to jump through these seemingly pointless hoops to replace a device with itself? Many thanks. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replacing a device with itself doesn't work
MP wrote: Hi, I hope someone can help cos ATM zfs' logic seems a little askew. I just swapped a failing 200gb drive that was one half of a 400gb gstripe device which I was using as one of the devices in a 3 device raidz1. When the OS came back up after the drive had been changed, the necessary metadata was of course not on the new drive so the stripe didn't exist. Zfs understandably complained it couldn't open the stripe, however it did not show the array as degraded. I didn't save the output, but it was just like described in this thread: http://www.nabble.com/Shooting-yourself-in-the-foot-with-ZFS:-is-quite-easy-t4512790.html I recreated the gstripe device under the same name stripe/str1 and assumed I could just: # zpool replace pool stripe/str1 invalid vdev specification stripe/str1 is in use (r1w1e1) It also told me to try -f, which I did, but was greeted with the same error. Why can I not replace a device with itself? As the man page describes just this procedure I'm a little confused. Try as I might (online, offline, scrub) I could not get the array to rebuild, just like was the guy described in that thread above. I eventually resorted to recreating the stripe with a different name stripe/str2. I could then perform a: # zpool replace pool stripe/str1 stripe/str2 Is there a reason I have to jump through these seemingly pointless hoops to replace a device with itself? Many thanks. Yes. From the fine manual on zpool: zpool replace [-f] pool old_device [new_device] Replaces old_device with new_device. This is equivalent to attaching new_device, waiting for it to resilver, and then detaching old_device. ... If new_device is not specified, it defaults to old_device. This form of replacement is useful after an existing disk has failed and has been physically replaced. In this case, the new disk may have the same /dev/dsk path as the old device, even though it is actu- ally a different disk. ZFS recognizes this. For a stripe, you don't have redundancy, so you cannot replace the disk with itself. You would have to specify the [new_device] I've submitted CR6612596 for a better error message and CR6612605 to mention this in the man page. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replacing a device with itself doesn't work
more below... MP wrote: On 03/10/2007, *Richard Elling* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Yes. From the fine manual on zpool: zpool replace [-f] pool old_device [new_device] Replaces old_device with new_device. This is equivalent to attaching new_device, waiting for it to resilver, and then detaching old_device. ... If new_device is not specified, it defaults to old_device. This form of replacement is useful after an existing disk has failed and has been physically replaced. In this case, the new disk may have the same /dev/dsk path as the old device, even though it is actu- ally a different disk. ZFS recognizes this. For a stripe, you don't have redundancy, so you cannot replace the disk with itself. I don't see how a stripe makes a difference. It's just 2 drives joined together logically to make a new device. It can be used by the system just like a normal hard drive. Just like a normal hard drive it too has no redundancy? Correct. It would be redundant if it were a mirror, raidz, or raidz2. In the case of stripes of mirrors, raidz, or raidz2 vdevs, they are redundant. You would have to specify the [new_device] I've submitted CR6612596 for a better error message and CR6612605 to mention this in the man page. Perhaps I was a little unclear. Zfs did a few things during this whole escapade which seemed wrong. # mdconfig -a -tswap -s64m md0 # mdconfig -a -tswap -s64m md1 # mdconfig -a -tswap -s64m md2 I presume you're not running Solaris, so please excuse me if I take a Solaris view to this problem. # zpool create tank raidz md0 md1 md2 # zpool status -v tank pool: tank state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 raidz1ONLINE 0 0 0 md0 ONLINE 0 0 0 md1 ONLINE 0 0 0 md2 ONLINE 0 0 0 errors: No known data errors # zpool offline tank md0 Bringing device md0 offline # dd if=/dev/zero of=/dev/md0 bs=1m dd: /dev/md0: end of device 65+0 records in 64+0 records out 67108864 bytes transferred in 0.044925 secs (1493798602 bytes/sec) # zpool status -v tank pool: tank state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scrub: none requested config: NAMESTATE READ WRITE CKSUM tankDEGRADED 0 0 0 raidz1DEGRADED 0 0 0 md0 OFFLINE 0 0 0 md1 ONLINE 0 0 0 md2 ONLINE 0 0 0 errors: No known data errors At this point where the drive is offline a 'zpool replace tank md0' will fix the array. Correct. The pool is redundant. However, if instead the other advice given; 'zpool online tank md0' is used then problems start to occur: # zpool online tank md0 # zpool status -v tank pool: tank state: ONLINE status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-4J scrub: resilver completed with 0 errors on Wed Oct 3 18:44:22 2007 config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 raidz1ONLINE 0 0 0 md0 UNAVAIL 0 0 0 corrupted data md1 ONLINE 0 0 0 md2 ONLINE 0 0 0 errors: No known data errors - ^^^ Surely this is wrong? Zpool shows the pool as 'ONLINE' and not degraded. Whereas the status explanation says that it is degraded and 'zpool replace' is required. That's just confusing. I agree, I would expect the STATE to be DEGRADED. - # zpool scrub tank # zpool status -v tank pool: tank state: ONLINE status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-4J scrub: resilver completed with 0 errors on Wed Oct 3 18:45:06 2007 config: NAMESTATE READ WRITE CKSUM
Re: [zfs-discuss] replacing a device with itself doesn't work
On Wed, Oct 03, 2007 at 12:10:19PM -0700, Richard Elling wrote: - # zpool scrub tank # zpool status -v tank pool: tank state: ONLINE status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-4J scrub: resilver completed with 0 errors on Wed Oct 3 18:45:06 2007 config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 raidz1ONLINE 0 0 0 md0 UNAVAIL 0 0 0 corrupted data md1 ONLINE 0 0 0 md2 ONLINE 0 0 0 errors: No known data errors # zpool replace tank md0 invalid vdev specification use '-f' to override the following errors: md0 is in use (r1w1e1) # zpool replace -f tank md0 invalid vdev specification the following errors must be manually repaired: md0 is in use (r1w1e1) - Well the advice of 'zpool replace' doesn't work. At this point the user is now stuck. There seems to be just no way to now use the existing device md0. In Solaris NV b72, this works as you expect. # zpool replace zwimming /dev/ramdisk/rd1 # zpool status -v zwimming pool: zwimming state: DEGRADED scrub: resilver completed with 0 errors on Wed Oct 3 11:55:36 2007 config: NAMESTATE READ WRITE CKSUM zwimmingDEGRADED 0 0 0 raidz1DEGRADED 0 0 0 replacing DEGRADED 0 0 0 /dev/ramdisk/rd1/old FAULTED 0 0 0 corrupted data /dev/ramdisk/rd1 ONLINE 0 0 0 /dev/ramdisk/rd2ONLINE 0 0 0 /dev/ramdisk/rd3ONLINE 0 0 0 errors: No known data errors # zpool status -v zwimming pool: zwimming state: ONLINE scrub: resilver completed with 0 errors on Wed Oct 3 11:55:36 2007 config: NAME STATE READ WRITE CKSUM zwimming ONLINE 0 0 0 raidz1 ONLINE 0 0 0 /dev/ramdisk/rd1 ONLINE 0 0 0 /dev/ramdisk/rd2 ONLINE 0 0 0 /dev/ramdisk/rd3 ONLINE 0 0 0 errors: No known data errors Good to know, but I think it's still a bit of ZFS fault. The error message 'md0 is in use (r1w1e1)' means that something (I'm quite sure it's ZFS) keeps device open. Why does it keeps it open when it doesn't recognize it? Or maybe it tries to open it twice for write (exclusively) when replacing, which is not allowed in GEOM in FreeBSD. I can take a look if this is the former or the latter, but it should be fixed in ZFS itself, IMHO. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgprcvACVf6zj.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replacing a device with itself doesn't work
I think I might have run into the same problem. At the time I assumed I was doing something wrong, but... I made a b72 raidz out of three new 1gb virtual disks in vmware. I shut the vm off, replaced one of the disks with a new 1.5gb virtual disk. No matter what command I tried, I couldn't get the new disk into the array. The docs said that replacing the vdev with itself would work, but it didn't. Nor did setting the 'automatic replace' feature on the pool and plugging a new device in. I recall most of the errors being device in use. Maybe I wasn't the problem after all? 0_o This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss