Like the recent question on "cloud", we are looking to "green" our
systems somewhat.

E.g. we would like to power down unneeded nodes and power them back on
when they can be useful for the workload.

I've done a limited extent of this manually, powering down unused
racks of nodes until I notice a need for the additional nodes.

I can create a host group containing "green" nodes subject to power
control and have these be the least restrictive access rules which
should allow all jobs to use the nodes.

We don't need fine grained power saving.  I'm thinking to power down
unused nodes every 15 minutes and looking to see if nodes need to be
powered on every 5 minutes.  I would also leave a few nodes powered on
and actually spin up new nodes as those are used (fill first being
important).

Has anyone successfully done something like this?

SDM/hedeby claims some support for this, but it looks like a horrid
cancer growth out the side of Grid Engine.  Can this be lopped off?

A while back I looked at SPIRIT <http://www.ciul.ul.pt/~ATP/SPIRIT/>
and it looked like it might function as a useful starting point.

Are there any other similar things I should look at?

It shouldn't be too hard to shutdown unused nodes.  There are a couple
interlocks which would need to occur to ensure GE doesn't try to start
a job just as the node is being shutdown (disable queues on the host
first and then checking again to ensure nothing got started).

Figuring out when to bring a node back online looks much harder.

Crudely, I can check if any jobs are waiting to run and just power up
a few nodes and hope the fulfill the need.  Repeat until job starts
running or all nodes are powered on.  I believe that this is what
SPIRIT does.

This needs to take into account jobs that might need other resources
beyond just compute nodes (software licenses, special hardware).
This isn't a current need for our systems.

This also needs to account for a job which needs more nodes and than
are available even if all the nodes where powered on.  This is
probably not currently an issue with our SGE cluster which mostly runs
lots of array jobs.

It also needs to deal with a user who might have requested a specific
(non-green) node or node group for some reason.  It only helps to
power on a new node if it would actually be used.

My biggest concern is doing something simple and this having
pathological edge cases which negate the entire effort.  Having broken
"green" capability can tick some check boxes.  Having working "green"
capability can actually save power, money and help the environment.

This is actually something that the job scheduler should be able to
help with.  Perhaps there are some hooks in SGE for SDM that could be
used without going down that whole SDM/hedeby/cloud computing route?

Any thoughts or pointers are appreciated,
Stuart
-- 
I've never been lost; I was once bewildered for three days, but never lost!
                                        --  Daniel Boone
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to