On Thu, Jun 16, 2011 at 06:35:27PM -0400, Roy McMorran wrote:
> I've been reading this thread with interest, while simultaneously
> dealing with an outage on our Oracle (n?e Sun) 7410C cluster.  This
> is the ZFS NAS appliance offering.  Ours is fairly small, about 40TB
> total, serving up NFS-only to Solaris, RHEL and VMWare hosts.
>
[snip tail of woe]
> 
> Granted I am running a one year old version of the system code.
> Support hasn't been able to say if newer code would have prevented
> any of this.
> 
> They are proposing that the first outage was probably due to the
> drive failure because data would be unavailable while the pool was
> resilvering.  Huh?  This is "RAID", right?  I'm confused.  The 2nd
> outage was allegedly due to restarting the daemon.  Apparently
> restarting the management daemon ifconfigs the interfaces down.  So
> why would you do that at 3PM on a Wednesday then?  And hmm, why did
> the same procedure only affect 'node two'?  Still waiting for
> answers.

We ran into the same bug with our 7410 cluster.  In version <mumble>
(~1 year old), rebuilding a disk caused huge performance impacts,
to the point of unresponsiveness.

The newer code version has different bugs.  The most fun one is
that when a failed disk finishes rebuilding onto a spare, and you
replace the failed drive, the system will fail the new drive.
They've given us a bunch of hotfixes for that one, but we are still
seeing issues.

John
_______________________________________________
Tech mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to