Here are my notes from when I added this:

I went ahead and added the parameter
-binding linear
to the default settings for jobs in
/.../common/sge_request

So now it will try to "bind" processes to specific cores, which is supposed to maybe improve performance.

So now in qstat -f -j JOBID output you'll see something like:

usage 1: cpu=00:00:00, mem=0.00000 GB s, io=0.00008 GB, vmem=1.754M, maxvmem=1.754M binding 1: ScttCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTT

Where the lower-case letters are the ones used by my job.

You can see the "binding" section in the qsub man page for more details:
http://arc.liv.ac.uk/SGE/htmlman/htmlman1/submit.html



On 07/30/2015 08:55 AM, Simon Andrews wrote:
Thanks, core binding looks like it does what we need.  Do I understand
correctly that if a process spawns more threads than slots that it will
then just restrict those threads to the core it’s been allocated, so
they’ll just end up slowing down their own job, and that it won’t
actually get killed?

I’ll be very careful in testing this :-)

Simon.

From: "MacMullan, Hugh" <[email protected]
<mailto:[email protected]>>
Date: Thursday, 30 July 2015 16:20
To: Simon Andrews <[email protected]
<mailto:[email protected]>>, "[email protected]
<mailto:[email protected]>" <[email protected]
<mailto:[email protected]>>
Subject: RE: Monitoring slot usage

Hi Simon:

We use 'Core Binding' to restrict users to the same number of cores as
slots requested.

http://www.gridengine.eu/grid-engine-internals/87-exploiting-the-grid-engine-core-binding-feature

We use a jsv to assign the binding value (force compliance) based on the
other job inputs: single slot and MPI jobs are bound to 1 core (for each
slot requested), OpenMP jobs are bound to the number of slots requested
in the pe option.

Or you might be able to just put '-binding linear:1' in
$SGE_ROOT/default/common/sge_request, and then have users specify
'-binding linear:#' if they're doing a SMP job.

Test carefully! :)

-Hugh

*From:*[email protected]
<mailto:[email protected]>
[mailto:[email protected]] *On Behalf Of *Simon Andrews
*Sent:* Thursday, July 30, 2015 11:01 AM
*To:* [email protected] <mailto:[email protected]>
*Subject:* [gridengine users] Monitoring slot usage

What is the recommended way of identifying jobs which are consuming more
CPU than they’ve requested?  I have an environment set up where people
mostly submit SMP jobs through a parallel environment and we can use
this information to schedule them appropriately.  We’ve had several
cases though where the jobs have used significantly more cores on the
machine they’re assigned to than they requested, so the nodes become
overloaded and go into an alarm state.

What options do I have for monitoring the number of cores simultaneously
used by a job and comparing this to the number which were requested so I
can find cases where the actual usage is way above the request and kill
them?

Thanks

Simon.

The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT
/Registered Charity No. 1053902./

The information transmitted in this email is directed only to the
addressee. If you received this in error, please contact the sender and
delete this email from your system. The contents of this e-mail are the
views of the sender and do not necessarily represent the views of the
Babraham Institute. Full conditions at: www.babraham.ac.uk
<http://www.babraham.ac.uk/terms>

The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT
/Registered Charity No. 1053902./

The information transmitted in this email is directed only to the
addressee. If you received this in error, please contact the sender and
delete this email from your system. The contents of this e-mail are the
views of the sender and do not necessarily represent the views of the
Babraham Institute. Full conditions at: www.babraham.ac.uk
<http://www.babraham.ac.uk/terms>



_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users


--
Alex Chekholko [email protected] 347-401-4860

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to