CCTL

Richard Elling Thu, 10 Dec 2009 09:42:41 -0800

On Dec 10, 2009, at 8:36 AM, Mark Grant wrote:

From what I remember the problem with the hardware RAID controlleris that the long delay before the drive responds causes the driveto be dropped from the RAID and then if you get another error on adifferent drive while trying to repair the RAID then that disk isalso marked failed and your whole filesystem is gone even thoughmost of the data is still readable on the disks; odds are you couldhave recovered 100% of the data using what is still readable on thecomplete set of drives, since the bad sectors on the two faileddrives probably wouldn't be in the same place. The end result isworse than not using RAID because you lose everything rather thanjust the files with bad sectors (though if you're using mirroringrather than parity then you could presumably recover most of thedata eventually).
Certainly if the disk was taking that long to respond I'd bereplacing it ASAP, but ASAP may not be fast enough if a second drivehas bad sectors too. And I have seen a consumer SATA driverepeatedly lock up a system for a minute doing retries when therewas no indication at all beforehand that the drive had problems.

For the Solaris sd(7d) driver, the default timeout is 60 seconds with3 or 5retries, depending on the hardware. Whether you notice this at theapplicationlevel depends on other factors: reads vs writes, etc. You can tunethis, of

course, and you have access to the source.
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] hard drive choice, TLER/ERC/CCTL

Reply via email to