I have similar issue too. Especially when users run MPI+Multithreads jobs. Some Multithreading programs by default use all of the cores on a node they find.
Now I have a script to scan the usage of CPU and RAM on all nodes, and it will warn me if it find any overloaded nodes. Not sure SGE has built-in ability to track the CPU cores each job uses. But it may not be difficult to prepare a script to do that routinely out of SGE. On Thu, Jul 30, 2015 at 11:00 AM, Simon Andrews <[email protected]> wrote: > What is the recommended way of identifying jobs which are consuming more CPU > than they’ve requested? I have an environment set up where people mostly > submit SMP jobs through a parallel environment and we can use this > information to schedule them appropriately. We’ve had several cases though > where the jobs have used significantly more cores on the machine they’re > assigned to than they requested, so the nodes become overloaded and go into > an alarm state. > > What options do I have for monitoring the number of cores simultaneously > used by a job and comparing this to the number which were requested so I can > find cases where the actual usage is way above the request and kill them? > > Thanks > > Simon. > > The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT > Registered Charity No. 1053902. > > The information transmitted in this email is directed only to the addressee. > If you received this in error, please contact the sender and delete this > email from your system. The contents of this e-mail are the views of the > sender and do not necessarily represent the views of the Babraham Institute. > Full conditions at: www.babraham.ac.uk > > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users > -- Best, Feng _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
