On Sep 30, 2010, at 7:04 AM, Giovanni Tirloni wrote:

>  Recently during an electrical maintenance, we faced a problem with some 
> servers that had redundant PSUs. After the power was shut down on the circuit 
> that serves the first PSU, the second PSU failed to keep some servers up and 
> they rebooted (came back normal and stayed stable after that). Tonight the 
> same procedure was done on the other power circuit and the second PSU failed 
> too (on a smaller number of machines). These are all enterprise-level servers 
> which vendors will promptly replace failed PSUs.. but these PSUs were working 
> fine as far as we can tell. Has anyone had this problem too?

Unless I misunderstood what you wrote, this sounds like the PSU didn't fail 
completely, as in no longer provides power to the system, but rather what you 
thought was N+1 redundancy actually needed both PSUs to provide sufficient 
power to keep everything running. That can happen if you're running the servers 
with more or different add-ons than the PSU is specified to power: additional 
hard drives, more power-hungry CPUs, more RAM can all eat up your redundancy 
overhead.  In the network world, Foundry Networks were especially good about 
measuring power ramp as you added more blades to their chassis and would alert 
when your N+2 had gone to N+1 and then just N; I haven't seen PC hardware that 
does the same but admittedly haven't kept up -- my worldview is more attuned to 
having a big pool of servers so single-box failures don't matter.

It could also be that the enterprise hardware is actually just enterprise-ish 
and doesn't actually support PSU failover. I'd be curious to hear whether the 
same thing happens with brand new PSUs, or whether failover worked in the past 
but doesn't work now.


 - Eric Sorenson - N37 17.255 W121 55.738  - http://twitter.com/ahpook  -

_______________________________________________
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to