On Wed, Jul 27, 2011 at 08:00:43PM -0500, Bob Friesenhahn wrote:
> On Tue, 26 Jul 2011, Charles Stephens wrote:
>
>> I'm on S11E 150.0.1.9 and I replaced one of the drives and the pool  
>> seems to be stuck in a resilvering loop.  I performed a 'zpool clear' 
>> and 'zpool scrub' and just complains that the drives I didn't replace 
>> are degraded because of too many errors.  Oddly the replaced drive is 
>> reported as being fine.  The CKSUM counts get up to about 108 or so 
>> when the resilver is completed.
>
> This sort of problem (failing disks during a recovery) is a good reason 
> not to use raidz1 in modern systems.  Use raidz2 or raidz3.
>
> Assuming that the system is good and it is really a problem with the  
> disks experiencing bad reads, it seems that the only path forward is to 
> wait for the resilver to complete or see if creating a new pool from a 
> recent backup is better.

Indeed, but that assumption may be too strong.  If you're getting
errors across all the members, you are likely to have some other
systemic problem, such as: 
 * bad ram / cpu / motherboard
 * too-weak power supply
 * faulty disk controller / driver

Had you scrubbed the pool regularly before the replacement? Were those
clean?  If not, the possibility is that the scrubs are telling you
that bad data was written originally, especially if it's repeatable on
the same files.  If it hits different counts and files each scrub, you
may be seeing corruption on reads, due to the same causes. Or you may
have both.

--
Dan.

Attachment: pgpV4QDOAXvnT.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to