Hi, I am trying to determine the best way to ensure that an SMP (OpenMP) job runs on a single node of the cluster exclusively. That is to say I want the entire node to this one job, even if the job doesn't use all cores on the node. Our cluster has 12 physical cores per node. My predecessor achieved this by setting the parallel environment as -pe smp 12, and setting the OMP_NUM_THREADS variable to whatever number of threads he wanted to actually execute on in the submitted script. This seemed to work well until recently. A job submitted in this way went onto a node where another job was already running, though it doesn't seem it should have, since there shouldn't have been 12 slots available on that node.
Can anyone tell me why this might've happened? Curiously, the cluster was tapped out at this point, and quse showed that the number of free slots went to -12! I had never seen that before either. I'm not sure if it's related, but qmon shows 24 CPUs per node, rather than 12 under queue control->queue instances. I'm guessing that is due to hyper threading. Thanks.
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users