Re: [zfs-discuss] replacing a drive in a raidz vdev
On Dec 2, 2006, at 1:32 PM, Bill Sommerfeld wrote: On Sat, 2006-12-02 at 00:08 -0500, Theo Schlossnagle wrote: I had a disk malfunction in a raidz pool today. I had an extra on in the enclosure and performed a: zpool replace pool old new and several unexpected behaviors have transpired: the zpool replace command hung for 52 minutes during which no zpool commands could be executed (like status, iostat or list). So, I've observed that zfs will continue to attempt to do I/O to the outgoing drive while a replacement is in progress. (seems counterintuitive - I'd expect that you'd want to touch the outgoing drive as little as possible, perhaps only attempting to read from it in the event that a block wasn't recoverable from the healthy drives). When it finally returned, the drive was marked as replacing as I expected from reading the man page. However, it's progress counter has not been monotonically increasing. It started at 1% and then went to 5% and then back to 2%, etc. etc. do you have any cron jobs set up to do periodic snapshots? If so, I think you're seeing: 6343667 scrub/resilver has to start over when a snapshot is taken I ran into this myself this week - replaced a drive, and the resilver made it to 95% before a snapshot cron job fired and set things back to 0%. Yesterday, a snapshot was taking to assist in backups -- that could be it. // Theo Schlossnagle // CTO -- http://www.omniti.com/~jesus/ // OmniTI Computer Consulting, Inc. -- http://www.omniti.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] replacing a drive in a raidz vdev
I had a disk malfunction in a raidz pool today. I had an extra on in the enclosure and performed a: zpool replace pool old new and several unexpected behaviors have transpired: the zpool replace command hung for 52 minutes during which no zpool commands could be executed (like status, iostat or list). When it finally returned, the drive was marked as replacing as I expected from reading the man page. However, it's progress counter has not been monotonically increasing. It started at 1% and then went to 5% and then back to 2%, etc. etc. I just logged in to see if it was done and ran zpool status and received: pool: xsr_slow_2 state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 100.00% done, 0h0m to go config: NAME STATE READ WRITE CKSUM xsr_slow_2 ONLINE 0 0 0 raidzONLINE 0 0 0 c4t600039316A1Fd0s2ONLINE 0 0 0 c4t600039316A1Fd1s2ONLINE 0 0 0 c4t600039316A1Fd2s2ONLINE 0 0 0 c4t600039316A1Fd3s2ONLINE 0 0 0 replacing ONLINE 0 0 0 c4t600039316A1Fd4s2 ONLINE 2.87K 251 0 c4t600039316A1Fd6ONLINE 0 0 0 c4t600039316A1Fd5s2ONLINE 0 0 0 I thought to myself, if it is 100% done why is it still replacing? I waited about 15 seconds and ran the command again to find something rather disconcerting: pool: xsr_slow_2 state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 0.45% done, 27h27m to go config: NAME STATE READ WRITE CKSUM xsr_slow_2 ONLINE 0 0 0 raidzONLINE 0 0 0 c4t600039316A1Fd0s2ONLINE 0 0 0 c4t600039316A1Fd1s2ONLINE 0 0 0 c4t600039316A1Fd2s2ONLINE 0 0 0 c4t600039316A1Fd3s2ONLINE 0 0 0 replacing ONLINE 0 0 0 c4t600039316A1Fd4s2 ONLINE 2.87K 251 0 c4t600039316A1Fd6ONLINE 0 0 0 c4t600039316A1Fd5s2ONLINE 0 0 0 WTF?! Best regards, Theo // Theo Schlossnagle // CTO -- http://www.omniti.com/~jesus/ // OmniTI Computer Consulting, Inc. -- http://www.omniti.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss