Re: [gridengine users] How to set properly the high priority queue ?

Reuti Mon, 01 May 2017 13:37:27 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,


Am 01.05.2017 um 21:52 schrieb <[email protected]>:

> Hello,
> 
> We have computer cluster  at our faculty  based on the nodes
> equipped with two Intel Xeon (R) Processors E5-2695 v3 (i.e. 2x14 = 28 
> physical = 56
> logical cores/node), where we use SGE or more precisely OGS/GE (OGS / GE 
> 2011.11p1)
> to run/distribute jobs.
> 
> On one of these nodes we would like to create a "high priority
>  queue" that should provide CPU resources preferentially to those jobs
>  which were submitted using this queue which should eventually 
> restrict/decrease
>  use of CPU resources in case of already running jobs which were submitted
>  earlier to this node using "ordinary queue".
>  
> Until now we just experimented with the SGE / OGE queue parameter
> "priority" which can be used to set a "nice" parameter for the given job.
> First we tested the value -10 (which appeared to be totally sufficient on
> ordinary workstation with 12 logical CPU cores (tested here without
> SGE) just using "nice" parameter) and later also -19.
> 
> In the situation when the given node was nearly fully loaded (i.e. 54-55 busy 
> CPU slots from the total 56 available) with jobs submitted using "ordinary 
> queue" we submitted here one parallel (24-slots) job using "high priority 
> queue" hoping that we achieve the similar effect as we saw in our 12-log. 
> core workstation, i.e. that the high priority job will get nearly 24x100% CPU 
> usage at the expense of running jobs submitted using "ordinary queue".
> 
> We performed this test with a parallel MPI job (pmemd.MPI - Molecular 
> Dynamics) and then another test with the GAMESS job (QM) where 
> parallelization is accomplished using TCP / IP sockets and SystemV shared 
> memory.

What type of MPI: Open MPI, MPICH, Intel MPI,  IBM Spectrum MPI, IBM/Platform 
MPI…?


> Unfortunately, neither one test did not meet our expectations. 
> SGE successfully assigned the "nice" value -10 and later -19 to the job 
> submitted in "high priority queue"
> but on the other hand this fact was not reflected properly in the allocation 
> of CPU resources for the high priority job. We obtained quite different and 
> unsatisfactory situation comparing to our first preliminary experiments 
> (without SGE just using "nice" parameter) on ordinary 12-log.CPU cores 
> workstation.
> 
> Please see here relevant screens.
> http://physics.ujep.cz/~mmaly/SCREENS/

How many independent jobs were on the node?


> I would be grateful for any relevant comments/tips which could help us to 
> successfully solve
> our problem with high priority queue.

I would say that these high priority jobs fight with the kernel processes 
having the same nice value for resources. The behavior of the nice value is to 
be more "nice" to other jobs, i.e. a higher value means to be nicer.

Essentially this means: normal jobs should get a 19 (yes, plus 19), and high 
priority jobs a value of 0 (zero). Negative values are reserved for important 
kernel tasks, and no user process should use them.

Side note A: as long as the number of active processes in the run queue of the 
kernel is lower than the number of cores, the nice value has no effect. I.e. 
having 8 cores and:

4 x nice 19
2 x nice 10
1 x nice 5
1 x nice 0

all will get 100%. The nice value comes only into play, when there are more 
processes than cores. This also means: 8 x nice 0 is essentially the same as 8 
x nice 19, as there is no one to be nice to.

Side note B: Using HT in a cluster is often not advisable, as the runtime of a 
job can't be predicted as it depends on other processes on the CPU. There was 
some discussion here:

https://www.mail-archive.com/[email protected]//msg30863.html (the 
complete thread and all links)

Maybe one get 130% of the CPU. Especially with MPI jobs this becomes a problem: 
all processes are doing the same at the the same time and fight for the same 
resources inside a CPU. Having 2 independent jobs on a CPU might be more 
promising.

Side note C: In most of the cases one MPI job doesn't know anything about the 
other MPI job on a node. If they have an automatic core binding enables, each 
starts to count at core 0 and binds to the same cores. It might be necessary to 
disable the automatic core binding and let the kernel scheduler do its best 
(unless you have a complete node for all tasks belonging to a job, which could 
of course spawn several nodes).

- -- Reuti

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iEYEARECAAYFAlkHm80ACgkQo/GbGkBRnRpqYACfZcQFchzTd5Nnr7/8RD682r1f
j0EAoLY9s0GpV1Bq7g56fkdkIr+2NsV2
=HzOx
-----END PGP SIGNATURE-----

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] How to set properly the high priority queue ?

Reply via email to