Re: [zfs-discuss] Why did resilvering restart?
On Tue, Nov 20, 2007 at 11:39:30AM -0600, Albert Chin wrote: On Tue, Nov 20, 2007 at 11:10:20AM -0600, [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote on 11/20/2007 10:11:50 AM: On Tue, Nov 20, 2007 at 10:01:49AM -0600, [EMAIL PROTECTED] wrote: Resilver and scrub are broken and restart when a snapshot is created -- the current workaround is to disable snaps while resilvering, the ZFS team is working on the issue for a long term fix. But, no snapshot was taken. If so, zpool history would have shown this. So, in short, _no_ ZFS operations are going on during the resilvering. Yet, it is restarting. Does 2007-11-20.02:37:13 actually match the expected timestamp of the original zpool replace command before the first zpool status output listed below? No. We ran some 'zpool status' commands after the last 'zpool replace'. The 'zpool status' output in the initial email is from this morning. The only ZFS command we've been running is 'zfs list', 'zpool list tww', 'zpool status', or 'zpool status -v' after the last 'zpool replace'. I think the 'zpool status' command was resetting the resilvering. We upgraded to b77 this morning which did not exhibit this problem. Resilvering is now done. Server is on GMT time. Is it possible that another zpool replace is further up on your pool history (ie it was rerun by an admin or automatically from some service)? Yes, but a zpool replace for the same bad disk: 2007-11-20.00:57:40 zpool replace tww c0t600A0B8000299966059E4668CBD3d0 c0t600A0B800029996606584741C7C3d0 2007-11-20.02:35:22 zpool detach tww c0t600A0B800029996606584741C7C3d0 2007-11-20.02:37:13 zpool replace tww c0t600A0B8000299966059E4668CBD3d0 c0t600A0B8000299CCC06734741CD4Ed0 We accidentally removed c0t600A0B800029996606584741C7C3d0 from the array, hence the 'zpool detach'. The last 'zpool replace' has been running for 15h now. -Wade [EMAIL PROTECTED] wrote on 11/20/2007 09:58:19 AM: On b66: # zpool replace tww c0t600A0B8000299966059E4668CBD3d0 \ c0t600A0B8000299CCC06734741CD4Ed0 some hours later # zpool status tww pool: tww state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 62.90% done, 4h26m to go some hours later # zpool status tww pool: tww state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 3.85% done, 18h49m to go # zpool history tww | tail -1 2007-11-20.02:37:13 zpool replace tww c0t600A0B8000299966059E4668CBD3d0 c0t600A0B8000299CCC06734741CD4Ed0 So, why did resilvering restart when no zfs operations occurred? I just ran zpool status again and now I get: # zpool status tww pool: tww state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 0.00% done, 134h45m to go What's going on? -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why did resilvering restart?
Resilver and scrub are broken and restart when a snapshot is created -- the current workaround is to disable snaps while resilvering, the ZFS team is working on the issue for a long term fix. -Wade [EMAIL PROTECTED] wrote on 11/20/2007 09:58:19 AM: On b66: # zpool replace tww c0t600A0B8000299966059E4668CBD3d0 \ c0t600A0B8000299CCC06734741CD4Ed0 some hours later # zpool status tww pool: tww state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 62.90% done, 4h26m to go some hours later # zpool status tww pool: tww state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 3.85% done, 18h49m to go # zpool history tww | tail -1 2007-11-20.02:37:13 zpool replace tww c0t600A0B8000299966059E4668CBD3d0 c0t600A0B8000299CCC06734741CD4Ed0 So, why did resilvering restart when no zfs operations occurred? I just ran zpool status again and now I get: # zpool status tww pool: tww state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 0.00% done, 134h45m to go What's going on? -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why did resilvering restart?
On Tue, Nov 20, 2007 at 10:01:49AM -0600, [EMAIL PROTECTED] wrote: Resilver and scrub are broken and restart when a snapshot is created -- the current workaround is to disable snaps while resilvering, the ZFS team is working on the issue for a long term fix. But, no snapshot was taken. If so, zpool history would have shown this. So, in short, _no_ ZFS operations are going on during the resilvering. Yet, it is restarting. -Wade [EMAIL PROTECTED] wrote on 11/20/2007 09:58:19 AM: On b66: # zpool replace tww c0t600A0B8000299966059E4668CBD3d0 \ c0t600A0B8000299CCC06734741CD4Ed0 some hours later # zpool status tww pool: tww state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 62.90% done, 4h26m to go some hours later # zpool status tww pool: tww state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 3.85% done, 18h49m to go # zpool history tww | tail -1 2007-11-20.02:37:13 zpool replace tww c0t600A0B8000299966059E4668CBD3d0 c0t600A0B8000299CCC06734741CD4Ed0 So, why did resilvering restart when no zfs operations occurred? I just ran zpool status again and now I get: # zpool status tww pool: tww state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 0.00% done, 134h45m to go What's going on? -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why did resilvering restart?
[EMAIL PROTECTED] wrote on 11/20/2007 10:11:50 AM: On Tue, Nov 20, 2007 at 10:01:49AM -0600, [EMAIL PROTECTED] wrote: Resilver and scrub are broken and restart when a snapshot is created -- the current workaround is to disable snaps while resilvering, the ZFS team is working on the issue for a long term fix. But, no snapshot was taken. If so, zpool history would have shown this. So, in short, _no_ ZFS operations are going on during the resilvering. Yet, it is restarting. Does 2007-11-20.02:37:13 actually match the expected timestamp of the original zpool replace command before the first zpool status output listed below? Is it possible that another zpool replace is further up on your pool history (ie it was rerun by an admin or automatically from some service)? -Wade [EMAIL PROTECTED] wrote on 11/20/2007 09:58:19 AM: On b66: # zpool replace tww c0t600A0B8000299966059E4668CBD3d0 \ c0t600A0B8000299CCC06734741CD4Ed0 some hours later # zpool status tww pool: tww state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 62.90% done, 4h26m to go some hours later # zpool status tww pool: tww state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 3.85% done, 18h49m to go # zpool history tww | tail -1 2007-11-20.02:37:13 zpool replace tww c0t600A0B8000299966059E4668CBD3d0 c0t600A0B8000299CCC06734741CD4Ed0 So, why did resilvering restart when no zfs operations occurred? I just ran zpool status again and now I get: # zpool status tww pool: tww state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 0.00% done, 134h45m to go What's going on? -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why did resilvering restart?
On Tue, Nov 20, 2007 at 11:10:20AM -0600, [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote on 11/20/2007 10:11:50 AM: On Tue, Nov 20, 2007 at 10:01:49AM -0600, [EMAIL PROTECTED] wrote: Resilver and scrub are broken and restart when a snapshot is created -- the current workaround is to disable snaps while resilvering, the ZFS team is working on the issue for a long term fix. But, no snapshot was taken. If so, zpool history would have shown this. So, in short, _no_ ZFS operations are going on during the resilvering. Yet, it is restarting. Does 2007-11-20.02:37:13 actually match the expected timestamp of the original zpool replace command before the first zpool status output listed below? No. We ran some 'zpool status' commands after the last 'zpool replace'. The 'zpool status' output in the initial email is from this morning. The only ZFS command we've been running is 'zfs list', 'zpool list tww', 'zpool status', or 'zpool status -v' after the last 'zpool replace'. Server is on GMT time. Is it possible that another zpool replace is further up on your pool history (ie it was rerun by an admin or automatically from some service)? Yes, but a zpool replace for the same bad disk: 2007-11-20.00:57:40 zpool replace tww c0t600A0B8000299966059E4668CBD3d0 c0t600A0B800029996606584741C7C3d0 2007-11-20.02:35:22 zpool detach tww c0t600A0B800029996606584741C7C3d0 2007-11-20.02:37:13 zpool replace tww c0t600A0B8000299966059E4668CBD3d0 c0t600A0B8000299CCC06734741CD4Ed0 We accidentally removed c0t600A0B800029996606584741C7C3d0 from the array, hence the 'zpool detach'. The last 'zpool replace' has been running for 15h now. -Wade [EMAIL PROTECTED] wrote on 11/20/2007 09:58:19 AM: On b66: # zpool replace tww c0t600A0B8000299966059E4668CBD3d0 \ c0t600A0B8000299CCC06734741CD4Ed0 some hours later # zpool status tww pool: tww state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 62.90% done, 4h26m to go some hours later # zpool status tww pool: tww state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 3.85% done, 18h49m to go # zpool history tww | tail -1 2007-11-20.02:37:13 zpool replace tww c0t600A0B8000299966059E4668CBD3d0 c0t600A0B8000299CCC06734741CD4Ed0 So, why did resilvering restart when no zfs operations occurred? I just ran zpool status again and now I get: # zpool status tww pool: tww state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 0.00% done, 134h45m to go What's going on? -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss