On Thu, Jun 16, 2011 at 06:35:27PM -0400, Roy McMorran wrote: > I've been reading this thread with interest, while simultaneously > dealing with an outage on our Oracle (n?e Sun) 7410C cluster. This > is the ZFS NAS appliance offering. Ours is fairly small, about 40TB > total, serving up NFS-only to Solaris, RHEL and VMWare hosts. > [snip tail of woe] > > Granted I am running a one year old version of the system code. > Support hasn't been able to say if newer code would have prevented > any of this. > > They are proposing that the first outage was probably due to the > drive failure because data would be unavailable while the pool was > resilvering. Huh? This is "RAID", right? I'm confused. The 2nd > outage was allegedly due to restarting the daemon. Apparently > restarting the management daemon ifconfigs the interfaces down. So > why would you do that at 3PM on a Wednesday then? And hmm, why did > the same procedure only affect 'node two'? Still waiting for > answers.
We ran into the same bug with our 7410 cluster. In version <mumble> (~1 year old), rebuilding a disk caused huge performance impacts, to the point of unresponsiveness. The newer code version has different bugs. The most fun one is that when a failed disk finishes rebuilding onto a spare, and you replace the failed drive, the system will fail the new drive. They've given us a bunch of hotfixes for that one, but we are still seeing issues. John _______________________________________________ Tech mailing list [email protected] https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
