Stephen Willey <[email protected]> writes: > You could use a load sensor to do this. We use one to detect if > people are logged in and suspend/requeue the jobs if someone logs in > while a job's on their workstation.
I don't understand how that addresses the question (as I understand it). > http://arc.liv.ac.uk/SGE/howto/loadsensor.html shows you how to make > one, then you'd set your queue to have a load/suspend threshold set at > whatever you'd like (configurable per queue instance or > host/hostgroup). But that's not specific to a job/task. > You'd probably use nvidia-smi (assuming you're on Linux) to get the > card details out and parse them to form the load figure. What's missing from <http://arc.liv.ac.uk/repos/darcs/sge/source/dist/util/resources/loadsensors/gpu-loadsensor.c>? It reports what seemed to be all the useful information that I could see how to extract from CUDA and OpenCL devices with the then-current support. Enhancements would be welcome, but I'm not convinced it's terribly useful. > There are a > few more related details here: > http://serverfault.com/questions/322073/howto-set-up-sge-for-cuda-devices As an example, I don't think that deals with the sort of usage that's supposed to be made of GPUs here, with mixed graphics/computation and shared/exclusive access. -- Community Grid Engine: http://arc.liv.ac.uk/SGE/ _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
