Am 02.05.11 15:51, schrieb Dave Love: I was promised the result of http://www.jisc.org.uk/whatwedo/programmes/greeningict/technical/supercomputers.aspx to look at adapting it for SGE. I should chase it up, but I don't know how comprehensive it is. As it was written for PBS, I suspect it may not do too useful things with resource requests, and I think the OSC systems are rather uniform compared with our rather heterogeneous ones.[Off-topic, I've never seen an assessment of the effects on reliability of treating HPC systems this way, especially already-unreliable ones like ours, where we're suspicious of thermal effects on the motherboard stability. Does anyone know of one?] If you look at the approved power-down cycles which the vendors publish for state-of-the-art HW then that may be scary. It usually is in the order of 2000 cycles. It certainly would cause problems to power down a resource several times a day. The better option would be to switch the systems into energy saving modes - if the system provides something like that. Not all do. That said, I know customer cases who have done powersaving with the brute force shutdown approach. There is a presentation by s+c about it which was held during the last Grid Engine workshop. Have the on-line proceedings been archived and made available somewhere? Cheers, Fritz Stuart Barkley <stua...@4gh.net> writes:On Fri, 29 Apr 2011 at 13:19 -0000, Chris Dagdigian wrote:I have absolutely seen this done with very real results. The most important thing is have the system generate emails to senior management saying things like "... I saved $12,000 in electricity last quarter ..." --That doesn't always seem to work unless it's switching off PCs or lights, unfortunately...I can't overstate enough the importance of making sure that you have the PR stuff covered in addition to the nice tech stuff under the hood.I think some of the desire is to have a "green" check box. In theory, the cluster will eventually be so busy there won't be any nodes suitable for powering off.If it isn't heavily loaded, you can typically win significantly just with Powernow-type features (if you can understand the BIOS parameters etc. sufficiently). If CPU frequency-changing is disabled for Infiniband, for instance, you can flip it in the GE prolog/epilog.I'm fine either way. I have IPMI working and will use that for power on. I'll probably do a shutdown command through ssh so the nodes go down cleanly. Not a problem to code.Yes, I don't understand why IPMI isn't scriptable/automatable (modulo the pervasive implementation bugs), but it may be useful to abstract through something like powerman anyhow. Is doing in-band shutdown more reliable than via IPMI/ACPI if you have stateful nodes?This also starts to interact with other systems which are also trying to manage the nodes. You don't want monitoring system alarms going off because you are saving power.Yes, you need hooks into Nagios, or whatever, but how does a database help with that?This is also where I see mission creep starting to happen as the "simple database" gains additional functionality.Does it need more than you already have if you run dbwriter with appropriate parameters logged?- monitor the length of the pending ("qw" state jobs) to see when new nodes need to be powered upSuggestions for getting information out of SGE? I was thinking of the xml outputs instead of parsing the human readable outputs. I've seen some comments that the xml stuff has been more subject to change over time than the human readable output.Yes, but even if you have the latest release, the XML can be ill-formed. It's also verbose, and so presumably distinctly less scalable. We do need a decent GE API for such things.Is this sufficient? For a basic homogeneous load it should be fine, but I'm worried about the edge cases.Yes, it's definitely not straightforward generally on a system like ours with heterogeneous nodes and very mixed job types.- script things so that when nodes are powered up they are by default coming up in disabled (state "d") so that they don't take jobs on right awayWhy is it better to do that, rather than just not starting execd?I'm behind in getting sanity / health checks working. A basic starting example would be very useful to see.http://code.google.com/p/nodediag/ is one framework you could use at startup, and possibly for later tests with Nagios NRPE, or similar.- BEfore you shut down a node, put it into state "d" so that you avoid a race condition between a job landing on the node and your shutdown command hitting itYes, and recheck the node for jobs after setting it disabled. Reenable if jobs got in. Races here could play havoc with losing track of node state.I don't understand this. What's unsafe about submitting a job to do the reboot, assuming it can ensure exclusive access to the node with the exclusive resource or by claiming all the slots? (I soft-stop execd in the job before the reboot command to avoid failed-job mail.)We are already gathering power usage information and putting it into ganglia. I need to do something else to better maintaining needed long term information. rrd files are great. They just are not for holding long term detailed information.If you can get it into a host value via a load sensor, presumably it could go in the dbwriter database. _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users --
Fritz Ferstl | CTO and Business
Development, EMEA
|
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users