Re: [zfs-discuss] replacing a drive in a raidz vdev

2006-12-02 Thread Theo Schlossnagle


On Dec 2, 2006, at 1:32 PM, Bill Sommerfeld wrote:


On Sat, 2006-12-02 at 00:08 -0500, Theo Schlossnagle wrote:

I had a disk malfunction in a raidz pool today.  I had an extra on in
the enclosure and performed a: zpool replace pool old new and several
unexpected behaviors have transpired:

the zpool replace command hung for 52 minutes during which no zpool
commands could be executed (like status, iostat or list).


So, I've observed that zfs will continue to attempt to do I/O to the
outgoing drive while a replacement is in progress.  (seems
counterintuitive - I'd expect that you'd want to touch the outgoing
drive as little as possible, perhaps only attempting to read from  
it in

the event that a block wasn't recoverable from the healthy drives).


When it finally returned, the drive was marked as replacing as I
expected from reading the man page.  However, it's progress counter
has not been monotonically increasing.  It started at 1% and then
went to 5% and then back to 2%, etc. etc.


do you have any cron jobs set up to do periodic snapshots?
If so, I think you're seeing:

6343667 scrub/resilver has to start over when a snapshot is taken

I ran into this myself this week - replaced a drive, and the resilver
made it to 95% before a snapshot cron job fired and set things back to
0%.


Yesterday, a snapshot was taking to assist in backups -- that could  
be it.


// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] replacing a drive in a raidz vdev

2006-12-01 Thread Theo Schlossnagle
I had a disk malfunction in a raidz pool today.  I had an extra on in  
the enclosure and performed a: zpool replace pool old new and several  
unexpected behaviors have transpired:


the zpool replace command hung for 52 minutes during which no zpool  
commands could be executed (like status, iostat or list).


When it finally returned, the drive was marked as replacing as I  
expected from reading the man page.  However, it's progress counter  
has not been monotonically increasing.  It started at 1% and then  
went to 5% and then back to 2%, etc. etc.


I just logged in to see if it was done and ran zpool status and  
received:


  pool: xsr_slow_2
state: ONLINE
status: One or more devices is currently being resilvered.  The pool  
will

continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress, 100.00% done, 0h0m to go
config:

NAME   STATE READ WRITE CKSUM
xsr_slow_2 ONLINE   0 0 0
  raidzONLINE   0 0 0
c4t600039316A1Fd0s2ONLINE   0 0 0
c4t600039316A1Fd1s2ONLINE   0 0 0
c4t600039316A1Fd2s2ONLINE   0 0 0
c4t600039316A1Fd3s2ONLINE   0 0 0
replacing  ONLINE   0 0 0
  c4t600039316A1Fd4s2  ONLINE   2.87K   251 0
  c4t600039316A1Fd6ONLINE   0 0 0
c4t600039316A1Fd5s2ONLINE   0 0 0


I thought to myself, if it is 100% done why is it still replacing? I  
waited about 15 seconds and ran the command again to find something  
rather disconcerting:


  pool: xsr_slow_2
state: ONLINE
status: One or more devices is currently being resilvered.  The pool  
will

continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress, 0.45% done, 27h27m to go
config:

NAME   STATE READ WRITE CKSUM
xsr_slow_2 ONLINE   0 0 0
  raidzONLINE   0 0 0
c4t600039316A1Fd0s2ONLINE   0 0 0
c4t600039316A1Fd1s2ONLINE   0 0 0
c4t600039316A1Fd2s2ONLINE   0 0 0
c4t600039316A1Fd3s2ONLINE   0 0 0
replacing  ONLINE   0 0 0
  c4t600039316A1Fd4s2  ONLINE   2.87K   251 0
  c4t600039316A1Fd6ONLINE   0 0 0
c4t600039316A1Fd5s2ONLINE   0 0 0

WTF?!

Best regards,

Theo

// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss