Rémi Piatek <[email protected]> writes: > I had considered this simple explanation, but it seemed unlikely to me, as it > would imply that we completely have to rely on users to specify correctly the > number of CPUs they need. People use my server for CPU-intensive jobs, so it > is > important for me to make sure that resources are fairly shared. I was hoping > slurm would allow me to do this, and prevent people from free-riding (so far, > it > would be easy to request a small number of CPUs and use a much larger number, > thus slowing down the other users). > > I read that when jobs exceed the memory requested and allocated by slurm, they > are automatically interrupted. Is there nothing similar for the use of CPUs? > > Thanks for the help! Much appreciated.
The cores a user has access to are limited by a CPU mask, which is created when the job starts. This means, in theory, that if a job starts more processes than the job has cores, just those cores belonging to the job would be over-booked and jobs of other users would be unaffected. However, we have seen code which fails to respect the CPU mask, so we also periodically check and, if necessary, reset the CPU affinity of jobs. Cheers, Loris -- This signature is currently under construction.
