|
Am 02.05.11 15:51, schrieb Dave Love: I was promised the result of http://www.jisc.org.uk/whatwedo/programmes/greeningict/technical/supercomputers.aspx to look at adapting it for SGE. I should chase it up, but I don't know how comprehensive it is. As it was written for PBS, I suspect it may not do too useful things with resource requests, and I think the OSC systems are rather uniform compared with our rather heterogeneous ones.[Off-topic, I've never seen an assessment of the effects on reliability of treating HPC systems this way, especially already-unreliable ones like ours, where we're suspicious of thermal effects on the motherboard stability. Does anyone know of one?] If you look at the approved power-down cycles which the vendors publish for state-of-the-art HW then that may be scary. It usually is in the order of 2000 cycles. It certainly would cause problems to power down a resource several times a day. The better option would be to switch the systems into energy saving modes - if the system provides something like that. Not all do. That said, I know customer cases who have done powersaving with the brute force shutdown approach. There is a presentation by s+c about it which was held during the last Grid Engine workshop. Have the on-line proceedings been archived and made available somewhere? Cheers, Fritz Stuart Barkley <[email protected]> writes:On Fri, 29 Apr 2011 at 13:19 -0000, Chris Dagdigian wrote:I have absolutely seen this done with very real results. The most important thing is have the system generate emails to senior management saying things like "... I saved $12,000 in electricity last quarter ..." --That doesn't always seem to work unless it's switching off PCs or lights, unfortunately...I can't overstate enough the importance of making sure that you have the PR stuff covered in addition to the nice tech stuff under the hood.I think some of the desire is to have a "green" check box. In theory, the cluster will eventually be so busy there won't be any nodes suitable for powering off.If it isn't heavily loaded, you can typically win significantly just with Powernow-type features (if you can understand the BIOS parameters etc. sufficiently). If CPU frequency-changing is disabled for Infiniband, for instance, you can flip it in the GE prolog/epilog.I'm fine either way. I have IPMI working and will use that for power on. I'll probably do a shutdown command through ssh so the nodes go down cleanly. Not a problem to code.Yes, I don't understand why IPMI isn't scriptable/automatable (modulo the pervasive implementation bugs), but it may be useful to abstract through something like powerman anyhow. Is doing in-band shutdown more reliable than via IPMI/ACPI if you have stateful nodes?This also starts to interact with other systems which are also trying to manage the nodes. You don't want monitoring system alarms going off because you are saving power.Yes, you need hooks into Nagios, or whatever, but how does a database help with that?This is also where I see mission creep starting to happen as the "simple database" gains additional functionality.Does it need more than you already have if you run dbwriter with appropriate parameters logged? --
|
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
Fritz Ferstl | CTO and Business
Development, EMEA