Perhaps someone on this mailing list can shed some light onto some odd
zfs circumstances I encountered this weekend.  I have an array of 5
400GB drives in a raidz, running on Nexenta.  One of these drives
showed a SMART error (HARDWARE IMPENDING FAILURE GENERAL HARD DRIVE
FAILURE [asc=5d, ascq=10]).  I preemptively replaced it, using "zpool
replace tank c3t0d0 c3t5d0".  The resilver started, but quickly hung
at "scrub: resilver in progress, 0.37% done, 132h14m to go".  NFS
stopped working and I think the system had some responsiveness issues.
I did have automatic hourly/daily/weekly snapshots running on the
filesystem at the time.  I rebooted it, but it would not come up in
any sane state, sometimes becoming pingable, but never becoming
ssh-able or consolable over serial (as it is configured to do).  I
tried using various live cds, to no avail.  I eventually got it to
boot after much gnashing of teeth in Nexenta's single user mode into a
login prompt, but only after both of the drives affected by the
replacement were physically removed.  Having both drives removed
allowed the system to give a login prompt.  The resilver proceeded
normally, and I watched the resilver complete.  I physically
reattached the drives and rebooted the system, at which time the pool
was online and no longer in a degraded state. The system now boots
normally.

So, after all that, my primary question is how did the resilvering
(which I liken to a rebuilding of the 5 drive array) take place with
only 4 drives online?  Shouldn't it have been writing data/parity to
the replacement drive?  Is this normal and the expected behavior?

Thanks for any insight!
Thomas
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to