Thanks.  Good information to know.

More comment embedded...

On Fri, 29 Apr 2011 at 13:19 -0000, Chris Dagdigian wrote:

> I have absolutely seen this done with very real results. The most
> important thing is have the system generate emails to senior
> management saying things like "... I saved $12,000 in electricity
> last quarter ..." -- I can't overstate enough the importance of
> making sure that you have the PR stuff covered in addition to the
> nice tech stuff under the hood.

I think some of the desire is to have a "green" check box.  In theory,
the cluster will eventually be so busy there won't be any nodes
suitable for powering off.

> The best system I've ever seen was doing this on a 4,000 core
> cluster built from blades. The blade system provided the necessary
> hooks by which some simple scripts could use simple passwordless SSH
> commands to start and stop nodes when needed. This was easier to
> script, manage and operate as opposed to having to do straight IPMI
> or other less scriptable/automatable methods.

I'm fine either way.  I have IPMI working and will use that for power
on.  I'll probably do a shutdown command through ssh so the nodes go
down cleanly.  Not a problem to code.

> The main points:
>
> - track state in a simple database

Yes, I figured this would be necessary to track some of the sub states
between "on" and "off".  My hope would be to just suck a small flat
file into perl and spit it back out and the end of any run.

This also starts to interact with other systems which are also trying
to manage the nodes.  You don't want monitoring system alarms going
off because you are saving power.

This is also where I see mission creep starting to happen as the
"simple database" gains additional functionality.

> - monitor the length of the pending ("qw" state jobs) to see when
> new nodes need to be powered up

Suggestions for getting information out of SGE?  I was thinking of the
xml outputs instead of parsing the human readable outputs.  I've seen
some comments that the xml stuff has been more subject to change over
time than the human readable output.

If just (length > 0) good?

Or actually wait for a few jobs to be waiting?  Counting array job
tasks?  Counting mpi job needs?

Is this sufficient?  For a basic homogeneous load it should be fine,
but I'm worried about the edge cases.

> - script things so that when nodes are powered up they are by
> default coming up in disabled (state "d") so that they don't take
> jobs on right away

Good point.

> - each node that boots up needs to run a series of sanity tests
> designed to protect against common startup failures (missing NFS
> mounts, etc) that could kill jobs. Running the sanity check script
> remotely via a passwordless SSH command seems to work and lets you
> report state/status back into your tracking database

I'm behind in getting sanity / health checks working.  A basic
starting example would be very useful to see.

> - only after the powered on node passes its sanity check do you
> switch the node away from disabled state "d" so it can start taking
> on work
>
> - BEfore you shut down a node, put it into state "d" so that you
> avoid a race condition between a job landing on the node and your
> shutdown command hitting it

Yes, and recheck the node for jobs after setting it disabled.
Reenable if jobs got in.  Races here could play havoc with losing
track of node state.

> - Track your up/down actions in enough detail so you can create
> reports showing how much power you have saved. Senior managers love
> this stuff

We are already gathering power usage information and putting it into
ganglia.  I need to do something else to better maintaining needed
long term information.  rrd files are great.  They just are not for
holding long term detailed information.

> I tried to get the people who wrote this system to turn it into a
> product and they were cool with it. The big company they worked for
> was also cool with it but we never went all that far because the
> effort of doing the legal stuff required to allow this code to leave
> the big company was basically "too much work" at the time.

Is it really a lot of code?  Yes, there are a lot of details to handle
to get things right for all the edge cases.

Making the code open source could also be just as hard for the "legal
stuff", but the company wouldn't need to worry about setting up sales
vehicles, supplying support or anything else.

I hope to make anything I come up with available for others to use.
It won't be general purpose and would need someone skilled to adapt it
to their environment.  This is what I'm looking for now, something
good to crib off of.

Thanks,
Stuart
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to