Hello,

I’m new to Slurm and am trying to implement it with a new cluster that we stood 
up. I’m having some success so far, but not completely with our Knights Landing 
nodes. I was wondering if anyone had experiences with these and if I’m missing 
some obvious configuration. Here are the issues I’m currently seeing if anyone 
has some insight:

- Submitting a job to a KNL partition seemingly chooses the nodes from that 
partition at random and will reboot them and change the constraints even if 
nodes with the proper constraints are idle and waiting. The documentation seems 
to indicate this should work by default.

- I’ve set a default partition (one that does not include KNL nodes) and am 
able to submit jobs just fine by either specifying or not specifying a 
partition. They generally seem to go to the right place. If I specify no 
partition during an sbatch submission and give some constraints that are 
specific to KNL, they end up just landing on the default (wrong) partition and 
sitting in a held state. I would expect it to see the constraints and go to the 
right nodes in the right partition. It isn’t clear from the docs if constraints 
will override a default partition and if not, is this possible so we can 
enforce a non-KNL default?

Since Slurm seems to be mostly doing the right things, I’m not having any luck 
figuring out where to poke next for the above issues. The Slurm logs don’t give 
me much to go off either even with a higher debug set. I’m using a 
knl_generic.conf with the same options in the Slurm documentation. Lastly, I’m 
running slurm 16.05.10-2.

Thanks!
--
John Roberts
[email protected]





Reply via email to