I have similar issue too. Especially when users run MPI+Multithreads
jobs. Some Multithreading programs by default use all of the cores on
a node they find.

Now I have a script to scan the usage of CPU and RAM on all nodes, and
it will warn me if it find any overloaded nodes.

Not sure SGE has built-in ability to track the CPU cores each job
uses. But it may not be difficult to prepare a script to do that
routinely out of SGE.



On Thu, Jul 30, 2015 at 11:00 AM, Simon Andrews
<[email protected]> wrote:
> What is the recommended way of identifying jobs which are consuming more CPU
> than they’ve requested?  I have an environment set up where people mostly
> submit SMP jobs through a parallel environment and we can use this
> information to schedule them appropriately.  We’ve had several cases though
> where the jobs have used significantly more cores on the machine they’re
> assigned to than they requested, so the nodes become overloaded and go into
> an alarm state.
>
> What options do I have for monitoring the number of cores simultaneously
> used by a job and comparing this to the number which were requested so I can
> find cases where the actual usage is way above the request and kill them?
>
> Thanks
>
> Simon.
>
> The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT
> Registered Charity No. 1053902.
>
> The information transmitted in this email is directed only to the addressee.
> If you received this in error, please contact the sender and delete this
> email from your system. The contents of this e-mail are the views of the
> sender and do not necessarily represent the views of the Babraham Institute.
> Full conditions at: www.babraham.ac.uk
>
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
>



-- 
Best,

Feng

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to