Like the recent question on "cloud", we are looking to "green" our systems somewhat.
E.g. we would like to power down unneeded nodes and power them back on when they can be useful for the workload. I've done a limited extent of this manually, powering down unused racks of nodes until I notice a need for the additional nodes. I can create a host group containing "green" nodes subject to power control and have these be the least restrictive access rules which should allow all jobs to use the nodes. We don't need fine grained power saving. I'm thinking to power down unused nodes every 15 minutes and looking to see if nodes need to be powered on every 5 minutes. I would also leave a few nodes powered on and actually spin up new nodes as those are used (fill first being important). Has anyone successfully done something like this? SDM/hedeby claims some support for this, but it looks like a horrid cancer growth out the side of Grid Engine. Can this be lopped off? A while back I looked at SPIRIT <http://www.ciul.ul.pt/~ATP/SPIRIT/> and it looked like it might function as a useful starting point. Are there any other similar things I should look at? It shouldn't be too hard to shutdown unused nodes. There are a couple interlocks which would need to occur to ensure GE doesn't try to start a job just as the node is being shutdown (disable queues on the host first and then checking again to ensure nothing got started). Figuring out when to bring a node back online looks much harder. Crudely, I can check if any jobs are waiting to run and just power up a few nodes and hope the fulfill the need. Repeat until job starts running or all nodes are powered on. I believe that this is what SPIRIT does. This needs to take into account jobs that might need other resources beyond just compute nodes (software licenses, special hardware). This isn't a current need for our systems. This also needs to account for a job which needs more nodes and than are available even if all the nodes where powered on. This is probably not currently an issue with our SGE cluster which mostly runs lots of array jobs. It also needs to deal with a user who might have requested a specific (non-green) node or node group for some reason. It only helps to power on a new node if it would actually be used. My biggest concern is doing something simple and this having pathological edge cases which negate the entire effort. Having broken "green" capability can tick some check boxes. Having working "green" capability can actually save power, money and help the environment. This is actually something that the job scheduler should be able to help with. Perhaps there are some hooks in SGE for SDM that could be used without going down that whole SDM/hedeby/cloud computing route? Any thoughts or pointers are appreciated, Stuart -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users