Hi,

I am trying to determine the best way to ensure that an SMP (OpenMP) job
runs on a single node of the cluster exclusively. That is to say I want the
entire node to this one job, even if the job doesn't use all cores on the
node. Our cluster has 12 physical cores per node. My predecessor achieved
this by setting the parallel environment as -pe smp 12, and setting the
OMP_NUM_THREADS variable to whatever number of threads he wanted to
actually execute on in the submitted script. This seemed to work well until
recently. A job submitted in this way went onto a node where another job
was already running, though it doesn't seem it should have, since there
shouldn't have been 12 slots available on that node.

Can anyone tell me why this might've happened? Curiously, the cluster was
tapped out at this point, and quse showed that the number of free slots
went to -12! I had never seen that before either.

I'm not sure if it's related, but qmon shows 24 CPUs per node, rather than
12 under queue control->queue instances. I'm guessing that is due to hyper
threading.

Thanks.
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to